RapidCRC-Unicode
RapidCRC-Unicode copied to clipboard
Advise support The incremental hash
i have a disk with 4T files, many small files.
i wish when i hash, it will compare with old sfv file check if exist if hash right and add new hash of new file add into sfv.
my english is poor,just like this: when hash old files→check whether exist in the old sfv files and whether hash right new files → add hash of new file into sfv then : once hash Complete two goals
See "Workaround" below, if you don't care about details.
- A Checksum/hash file like
*.sfv
stores the File Path and the Checksum, usually not the folder of which the checksums are calculate neither if specific files of the folder where cherry-picked. Users which are cherry-picking the files, of which a checksum file is calculated, would be angry, if RapidCRC would calculate checksums for maybe a lot a lot more files than needed. - Is a "new file" new, because it is a "new detected file path" or because it is a "new detected checksum result"?
- You can use the File Path for identification and then detect checksum changes (e.g. CRC32). (Usual way to deal with)
- You can use the strong Hash (e.g. Blake3) for identification and then detect file movements. (Identification by Hash is used on Content-Adressable-Storage like they are implemented by restic or kopia, but because of the risk of Birthday-Paradox only reasonable with strong hash algorithms and still avoided on Enterprise Systems like IBM).
I think since "checksum and file change-detection" can become complex, neither of them will become implemented. The only program that i Know and search for additional files (like MultiPar for Pararchive-Format) has a problem when dealing with a lot of small files or with a huge amount of data.
Checksum Storage Way | Can detect checksum missmatch? | Can detect missing files? | Tell you which files are without recognized checksum? | Can still work with randomly renamed files? | Comment |
---|---|---|---|---|---|
central Checksum-File | ✔️ | ✔️ | ❌ | ❌ | usual way |
decentral in the File Name. (e.g. you always check all files of a folder) | ✔️ | ❌ | ✔️ | ❌ | also common way / as long as you preserve the checksum on file rename operations (so you have to avoid to rename by automatically tools) |
decentral and sticky NTFS Streams (e.g. you always check all files of a folder) | ✔️ | ❌ | ✔️ | ✔️ | NTFS-Streams are only working as long as you're moving/storing files within NTFS Volumes. |
Note: The latter two decentral storage Options are automatically recognized and checked, if RapidCRC is not verifiying a checksum-file (so only calculating "new" checksums for a file).
Workaround
You can calculate a new checksum file. Since it is only a text file, you can check that the lines of new and old checksum file are order-synchronized (If not: sort all lines alphabetically with a tool), and then a Text Comparison Program of your choice will tell you added / removed and different lines and thus new / deleted and missmatching files betwenn the two text files.
Incremental hashing doesn't really fit that well into the concept of RapidCRC. I will most likely not add this.