实用脚本能批量哈希吗？

wen 实用脚本 2026-06-09 12

本文目录导读：

实用脚本能批量哈希吗？

核心思路
批量计算文件的哈希值 (最常用场景)
批量计算文本字符串的哈希值 (用于密码破解/测试)
批量哈希 + 去重 (找到重复文件)
性能优化技巧 (小白也会懂)
直接能用的命令

当然可以,批量哈希是脚本非常典型的应用场景，无论是为了校验文件完整性、数据去重、安全审计还是密码破解，都有现成的实用脚本。

下面我会介绍几种不同场景下的实用批量哈希脚本,并附上 Python 和 Shell 的示例。

核心思路

批量哈希的核心就是遍历文件夹或文件列表，对每一个文件调用哈希函数，并将结果（文件名、路径、哈希值）输出到文件或终端。

批量计算文件的哈希值 (最常用场景)

场景：你下载了一个包含几百个文件的软件包或数据集，需要验证每个文件的 SHA256 是否与官方提供的一致。

Python 脚本 (cross-platform)

import os
import hashlib
import sys
def get_file_hash(filepath, algorithm='sha256', buffer_size=65536):
    """计算单个文件的哈希值"""
    h = hashlib.new(algorithm)
    with open(filepath, 'rb') as f:
        while True:
            chunk = f.read(buffer_size)
            if not chunk:
                break
            h.update(chunk)
    return h.hexdigest()
def batch_hash_directory(directory, algorithm='sha256', output_file='hashes.txt'):
    """批量计算目录下所有文件的哈希值"""
    if not os.path.isdir(directory):
        print(f"错误: {directory} 不是一个有效目录")
        return
    results = []
    for root, dirs, files in os.walk(directory):
        for filename in files:
            filepath = os.path.join(root, filename)
            try:
                relative_path = os.path.relpath(filepath, directory)
                file_hash = get_file_hash(filepath, algorithm)
                results.append(f"{relative_path}: {file_hash}")
                print(f"✓ {relative_path}")
            except Exception as e:
                print(f"✗ 读取失败 {filepath}: {e}")
    # 写入结果文件
    with open(output_file, 'w') as f:
        f.write(f"# 目录: {directory}\n")
        f.write(f"# 算法: {algorithm}\n")
        f.write("\n".join(results))
    print(f"\n结果已保存至: {output_file}")
if __name__ == "__main__":
    # 使用方法: python hash_dir.py /path/to/your/dir
    if len(sys.argv) < 2:
        print("用法: python hash_dir.py <目录路径> [算法] [输出文件名]")
        sys.exit(1)
    target_dir = sys.argv[1]
    algo = sys.argv[2] if len(sys.argv) > 2 else 'sha256'
    out_file = sys.argv[3] if len(sys.argv) > 3 else 'hashes.txt'
    batch_hash_directory(target_dir, algo, out_file)

Shell 单行命令 (Linux/Mac)

最经典的做法,一行搞定：

# 递归计算当前目录下所有文件的 SHA256
find . -type f -exec sha256sum {} \; > checksums.sha256

你也可以用 MD5:

find . -type f -exec md5sum {} \; > checksums.md5

批量计算文本字符串的哈希值 (用于密码破解/测试)

场景：你有几十万个密码字典，想快速将每个密码转换成不同的哈希格式（如 MD5, SHA1, BCrypt）。

Python 脚本 (高效，使用多进程)

import hashlib
from multiprocessing import Pool
def hash_password(password):
    """示例: 将密码转为 MD5 和 SHA256"""
    pwd = password.strip()
    md5_hash = hashlib.md5(pwd.encode()).hexdigest()
    sha256_hash = hashlib.sha256(pwd.encode()).hexdigest()
    return f"{pwd}:{md5_hash}:{sha256_hash}"
def batch_hash_passwords(input_file, output_file, num_threads=8):
    """批量处理字典文件"""
    with open(input_file, 'r', encoding='utf-8', errors='ignore') as f:
        passwords = f.readlines()
    with Pool(num_threads) as pool:
        results = pool.map(hash_password, passwords)
    with open(output_file, 'w') as f:
        f.write("\n".join(results))
    print(f"处理完成，共 {len(passwords)} 行，结果保存至 {output_file}")
if __name__ == "__main__":
    batch_hash_passwords("passwords.txt", "hashed_passwords.txt")

批量哈希 + 去重 (找到重复文件)

场景：硬盘里图片、视频太多，想通过哈希找到内容完全相同的文件。

Python 脚本 (按文件大小初筛 + 哈希确认)

这个脚本更实用：它会先按文件大小分组，只有相同大小的文件才计算哈希，效率极高。

import os
import hashlib
from collections import defaultdict
def find_duplicates(root_dir):
    """找到目录下的所有重复文件"""
    size_map = defaultdict(list)
    hash_map = defaultdict(list)
    # 第一轮: 按文件大小分组
    for dirpath, _, filenames in os.walk(root_dir):
        for f in filenames:
            filepath = os.path.join(dirpath, f)
            if os.path.isfile(filepath) and not os.path.islink(filepath):
                size = os.path.getsize(filepath)
                size_map[size].append(filepath)
    # 第二轮: 只对相同大小的文件计算哈希
    for size, files in size_map.items():
        if len(files) > 1:  # 只有大小相同时才可能重复
            for filepath in files:
                with open(filepath, 'rb') as f:
                    # 只读取前64KB作为快速哈希 (可改为全文件)
                    file_hash = hashlib.md5(f.read(65536)).hexdigest()
                hash_map[file_hash].append(filepath)
    # 输出重复文件
    print("\n发现重复文件:")
    duplicates_found = False
    for hash_val, file_list in hash_map.items():
        if len(file_list) > 1:
            duplicates_found = True
            print(f"\n哈希值: {hash_val} (重复 {len(file_list)} 份)")
            for file_path in file_list:
                print(f"    {file_path}")
    if not duplicates_found:
        print("未发现重复文件。")
if __name__ == "__main__":
    find_duplicates("/path/to/your/folder")

性能优化技巧 (小白也会懂)

场景	优化方法	效果
超大文件 (GB级)	只读取文件头/尾几MB	速度快上千倍，但准确率仍很高
海量小文件 (百万级)	使用 `concurrent.futures` 多线程	利用多核CPU，提速4-8倍
磁盘IO瓶颈	使用 `shutil.copyfileobj` 带缓冲区	减少系统调用
文件去重	先按大小筛选，再按前64KB哈希筛选	减少全文件哈希次数90%以上

直接能用的命令

在 Linux/Mac 终端：

# 1. 批量生成 SHA256 校验文件 (递归)
find . -type f -exec sha256sum {} \; > all_checksums.sha256
# 2. 批量校验 (假设你有官方 checksums.sha256)
sha256sum -c checksums.sha256
# 3. 快速找出重复文件 (基于大小+MD5)
fdupes -r .

在 Windows PowerShell：

# 批量计算所有文件的 MD5
Get-ChildItem -Recurse -File | ForEach-Object {
    $hash = (Get-FileHash $_.FullName -Algorithm MD5).Hash
    "$($_.FullName): $hash"
} | Out-File hashes.txt

这些脚本你直接复制保存为 .py 或 .sh 就能用。 如果你有更具体的需求（比如只哈希图片文件、需要并行加速、或配合数据库存储），告诉我，我可以给你更精准的定制版本。