Possibly adding mtime or more metadata
Hello,
Hashit is one of the fastest check-summing tools. I would much like it to use it for about 200TB of data spread over several servers, as it would offer a significant time savings, especially on hard disks.
I would need also to collect a bit more metadata, at least mtime. To do that, at the moment I need to recursively run another tool.
Would you consider adding such option, perhaps including even more metadata?
A possible example that comes to mind is using printf syntax, like rhash format options.
Thank you for consider this.
Sure more than happy to add this. What do you mean by mtime though? Could you give show me some examples? If so I can probably add it very quickly.
Thank you!
This page helped me clarify mtime vs ctime aspects in Unix/Linux systems.
md5deep has t option for "creation time", which is "changed time" on Unix, but mtime would be more useful, as we are tracking hashes of content, not metadata. To have both would be even better.
I found rhash's printf format switches more useful, as one can better compose the output. But even a switch like md4deep would be excellent.
In the former case, as uninformed example, command hashit --format printf '%s,%{md5},%f,%d,%{mtime}\n' test.txt would return something like:
61537206,b2d3c0e03cd0c0e56e60c7a395d6f4cd,test.txt,/Users/lispstudent/,2019-11-25 22:02:37
Ah gotcha. Yeah seems reasonable. Ill have a look at implementing this over the next few days then. I don't think it should be too hard, but then again I have been wrong before.
Let me know if I can be of help testing. I am using MacOs, and I could try to compile on some FreeBSD servers too.
Will do. I am able to test on macOS and Linux easily, and Windows with a bit of effort.
https://github.com/djherbis/times
This looks like it might solve the issue
Working with this on a branch, with the following outputs
$ hashit -f hashdeep --mtime processor
%%%% HASHDEEP-1.0
%%%% size,md5,sha256,filename,mtime
## Invoked from: /Users/boyter/Documents/projects/hashit
## $ hashit -f hashdeep --mtime processor
##
2840,ce29ce9a95713628e1d8e43a51027ac1,7dcc785a34ce95c4e741e92177f221e6d05d9c1663481f35c54286fc6645934f,processor/workers_test.go,2021-11-15 09:03:56
412,fea2253b11e12a134efebee40c7ca544,d6d0410c9fd662f08ccf2586661ff3c9623c68d209dec680ac8553ae5ebcf899,processor/file.go,2021-11-15 09:18:30
406,b8db244d45fa9eb0f1d510b107c6cf03,f432b5f092b7082cd5c2f01cba61d093e57e46b72678f1cfc7eb1b17ff30e2f6,processor/structs.go,2024-07-26 12:04:56
8179,61feb67b40e75ffd0279478ac288d697,e3ee2622a4b7747969514ce7f2e7cb4d01bed1797904793bb23d34e685a2ef90,processor/formatters.go,2024-07-26 12:12:20
4841,22fa73c10faca3bf40e576a5699cc479,bbfaa989480ca68ffc134ca396f900cd6b21e7bd4f5652d03c0692b49e2d84fb,processor/processor.go,2024-07-26 12:13:06
20906,7febbe6c7b0fb1d1e1f5a98ab0199a0f,a194f7f7719bfa06c3a1fd084f1cc62d04e5f8879aa00af076366cb991b339f7,processor/workers.go,2024-07-26 12:06:27
Also added to JSON and such
$ hashit -f json --mtime main.go | jq
[
{
"File": "main.go",
"MD4": "",
"MD5": "80ce62ad784fdcacaee9b6ff30fd5f3e",
"SHA1": "9378850dc3f9833f3a0462643485d96a86fca348",
"SHA256": "dd9ec0cabad718fd1bb248ed0f77351a072c57a957ca7aa3c72bf23ce29816ea",
"SHA512": "05624869306a86f2820c6ad4d6b2f47a3de24221b980240a20e750c6fc3172bd834b4ec9cf66fdaa95460aef035bbf3c4dedada7a65ee6ae7b5a700656e55ce7",
"Blake2b256": "",
"Blake2b512": "",
"Blake3": "",
"Sha3224": "",
"Sha3256": "",
"Sha3384": "",
"Sha3512": "",
"Bytes": 2147,
"MTime": "2024-07-26T12:03:41.782624482+10:00"
}
]
@lispstudent this sort of what you are thinking about?
Note have not done a printf style format yet... because I am lazy, but its something I think I might add via a new format type, since it would not be compatible with the hashdeep format as I understand it.
That is excellent. As long as --mtime switch is there, I am fine withoutprintf format.
If I run sha1deep with -t, this is the output I get
sha1deep -zt test.txt
2291164 6dea7e33c15032c0db470e6d5efb9da2342d5c1b 2024:02:29:09:05:25 /Users/lispstudent/test.txt
Thank you!
Latest release should have this for you.
Thank you for the release!
Testing on MacOS on arm CPU it seems I can see the mtime value only with --format json.
e.g.
hashit --hash md5 --mtime --format json a.csv
[{"File":"a.csv","MD4":"31d6cfe0d16ae931b73c59d7e0c089c0","MD5":"08e30724d71d40b07e1b412ec30cda37","SHA1":"da39a3ee5e6b4b0d3255bfef95601890afd80709","SHA256":"e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855","SHA512":"cf83e1357eefb8bdf1542850d66d8007d620e4050b5715dc83f4a921d36ce9ce47d0d13c5d85f2b0ff8318d2877eec2f63b931bd47417a81a538327af927da3e","Blake2b256":"0e5751c026e543b2e8ab2eb06099daa1d1e5df47778f7787faab45cdf12fe3a8","Blake2b512":"786a02f742015903c6c6fd852552d272912f4740e15847618a86e217f71f5419d25e1031afee585313896444934eb04b903a685b1448b755d56f701afe9be2ce","Blake3":"af1349b9f5f9a1a6a0404dea36dcc9499bcb25c9adc112b7cc9a93cae41f3262","Sha3224":"6b4e03423667dbb73b6e15454f0eb1abd4597f9a1b078e3f5b5a6bc7","Sha3256":"a7ffc6f8bf1ed76651c14756a061d662f580ff4de43b49fa82d80a4b80f8434a","Sha3384":"0c63a75b845e4f7d01107d852e4c2485c51a50aaaa94fc61995e71bbee983a2ac3713831264adb47fb6bd1e058d5f004","Sha3512":"a69f73cca23a9ac5c8b567dc185a756e97c982164fe25859e0d1dcc1475c80a615b2123af1f5f94c11e3e9402c3ac558f500199d95b6d3e301758586281dcd26","Bytes":1258715,"MTime":"2024-02-13T13:05:11.3714732+01:00"}]%
Instead, with --format text
hashit --hash md5 --mtime --format text a.csv
a.csv (1258715 bytes)
MD5 08e30724d71d40b07e1b412ec30cda37
Also, --format json outputs all checksums, not just the one requested. So, it seems, hashit computes all checksms no matter what is the command line switch.
I am so sorry @lispstudent I totally missed your response. I think I somehow screwed up a merge or something and never noticed. Mtime is now 100% in all the outputs as you would expect now for all types, with the exception of standard input processing where there is no file to calcualte the time, in which case you get this error
$ echo "hello" | hashit --mtime
ERROR 2025-12-01T07:24:23Z: cannot use --mtime option with standard input, ignoring flag
stdin (6 bytes)
MD5 b1946ac92492d2347c6235b4d2611184
SHA1 f572d396fae9206628714fb2ce00f72e94f2258f
SHA256 5891b5b522d5df086d0ff0b110fbd9d21bb4fc7163af34d08286a2e846f6be03
SHA512 e7c22b994c59d9cf2b48e549b1e24666636045930d3da7c1acb299d1c3b7f931f94aae41edda2c2b207a36e10f8bcb8d45223e54878f5b316e7ce3b6bc019629
thanks @clach04 for pointing this out.
Thank you for fixing this.
I am still waiting for upstream to help us solve #15, then I will test this on our SmartOS servers. Much looking forward to it.
If I ever forget again due to closing PLEASE ping me. I am not bothered by this and id rather fix the issue than have it hidden.
IF the SQLite thing isnt resolved quickly, I may remove the reliance on it being built, and instead switch to using stdout redirects to run the SQL.
I wanted to avoid the issues with trying to escape and thought having it built in would be useful, and allow the audit to use the SQLite.
I think it would be better if it could be fixed upstream, but if not let me know.
I am not sure if relevant, but there was a merge request last week.
Regarding using stdout redirects, please forgive my ignorance, would that mean to launch a subshell each time? Why using the native SQLite with a native Go driver was not deemed satisfactory?
Yes, it would mean doing something like the below
hashit --format sql DIRECTORY | sqlite3 hashit.db
to populate the database. It means having sqlite installed however.
The reason for not using the mattn sqlite driver is because it means you have a CGO dependancy which makes cross compiling a bit of pain. While the modernc go port is slightly slower, it makes building trivial without needing to setup the C toolchains. Its why I use it in every project where I want SQLite these days with Go integration. Just so much easier to work with.
Understood, thank you.