Save checksums to manifest textfile/md5sum
Hi - this might be similar to https://github.com/tedsmith/quickhash/issues/63, but I was wondering if you could add an option to save checksums to a manifest textile, perhaps using a style similar to md5sum
checksum relative/path/to/file
I work in the National Library of Ireland and we like to get donors to generate checksums on their end before sending the files to us. Quickhash is so great for this as one can just drag and drop into a GUI. Being able to save a list of files would be ideal. So far, I've just tested it with small donations where two files were transferred, and checksums were copy/pasted.
So a sidecar file might contain checksums like this:
ae8c5d5f6288964d65c13459f5334258 storage/2020-09-21/rokyu2aberfc5jto/indexes/index-20200921183136-CXOKODVD.cdxj
e9fe18d9eeafddf6712dc95199ca0611 behaviors/dist/soundcloudArtistBehavior.js
da45bba92fc439ba6c547e07ec64453d data/Webrecorder-Data/behaviors/dist/instagramUserBehavior.js
What do you think?
I just noticed this issue https://github.com/tedsmith/quickhash/issues/70 which would go some way to resolving my issue. I think that the md5sum text manifests seems to be quite a common approach though. It's also used in the library of congress bagit standard and it would allow for interoperability with other checksumming tools.
Hiya. Not sure I follow. The ability to export to CSV\TSV exists, that hold the filename, path etc. What does the manifest textfile suggestion add over and above that? Sorry for my confusion.
So sorry, I actually replied to this yesterday and I must not have submitted! The ability to create text checksums that are similar to md5sum would allow for greater interoperability with other checksums tools like the sum tools, hashdeep, and even a bunch of custom python scripts that exist within the digital preservation community.
OK. But I still don't quite follow I'm afraid. The user can output the results of QH to CSV or HTML where the filenames, paths, and hash(es) are saved. I'm not sure how your suggestion of a "manifest" differs? Isn't the output from QH considered as a "manifest"?
Your example would be a manifest as well, but you would not be able to use that manifest with any other tool than quick hash (without editing the manifest) in order to validate the checksums. But using the md5sum style of checksum relative/file/path allows for other tools to use these particular kinds of manifests.
This is what the Library of Congress Bagit software produces:
ae8c5d5f6288964d65c13459f5334258 storage/2020-09-21/rokyu2aberfc5jto/indexes/index-20200921183136-CXOKODVD.cdxj
e9fe18d9eeafddf6712dc95199ca0611 behaviors/dist/soundcloudArtistBehavior.js
da45bba92fc439ba6c547e07ec64453d data/Webrecorder-Data/behaviors/dist/instagramUserBehavior.js
and there are several tools out there that would be able to validate those checksums. My hope for quickhash is that if it can also create these types of manifests, it will be an even more useful tool for digital preservation activities within libraries and archives. I have already asked a donor to use quickhash to create checksums locally before zipping and uploading to the cloud for transfer. It would be ideal if a more interoperable checksum manifest could be used, and I think the main format is that md5sum (or shamus) style.
Ok, coming back to this again as another archive has the following use case:
- Various stakeholders need to deposit files with checksums, which are then verified upon delivery.
- I'm hoping that quickhash will be tool that's used as it's open-source and cross-platform with a GUI.
- The current CSV reports use absolute paths, so if relative paths were used, the checksum manifest that is created could be easily verified when the files are delivered to the archive.
- If there was an option to just store
checksum relative/file/paththen multiple tools in the archive could be used to validate the checksum manifest, as that style of manifest is the most commonly used in archival settings.
To see a practical example, imagine getting the contents of this zipfile as an online or hard drive delivery. Being able to have the manifest along with the files like this allows for many tools to validate the delivery. I'm hoping that QuickHash can perform this rather than the CSV/HTML absolute path reports. Teracopy does something similar to the manifest I'm mentioning, but it's windows only and is closed-source. Archive.zip
Hi Ted, just checking in on this again to see if it's something you're interested in supporting?