No elegent way to stream hash given a hash code
Hi,
In previous multihash version, we used to be able to compute the digest in a streamed manner using MultihashDigest::input and it was possible to get a boxed MultihashDigest given a multihash.
I currently see no way of doing the same, which is an issue in some use cases.
For example, I need to validate a digest computed from a file. Since the file can be big, I want to use the new StatefulHasher trait. However, I found no way to get a trait object.
Here’s my code:
pub fn validate_file_checksum(expected_digest: &str, file_path: &Path) -> std::io::Result<bool> {
let (_, hash_data) = multibase::decode(expected_digest).map_err(|e| Error::new(ErrorKind::InvalidInput, e))?;
let expected_digest = Multihash::from_bytes(&hash_data).map_err(|e| Error::new(ErrorKind::InvalidInput, e))?;
let hash_code = multihash::Code::try_from(expected_digest.code()).map_err(|e| Error::new(ErrorKind::InvalidInput, e))?;
// FIXME: multihash new API is breaking this code for streaming hashing (checked for version 0.14)
//
//const BUF_SIZE: usize = 1024 * 128;
//let file = File::open(file_path)?;
//let mut reader = BufReader::with_capacity(BUF_SIZE, file);
//
//let hasher = todo!("get an appropriate trait object hasher given the hash code");
//
//loop {
// let length = {
// let buffer = reader.fill_buf()?;
// hasher.update(buffer);
// buffer.len()
// };
// if length == 0 {
// break;
// }
// reader.consume(length);
//}
//
//let digest_found = hasher.finalize();
//
// So instead, we read the whole file in memory:
let file_content = std::fs::read_to_string(file_path)?;
let digest_found = hash_code.digest(file_content.as_bytes());
Ok(expected_digest == digest_found)
}
If I overlooked something, please let me know!
Thank you
It is a bit confusing as both Hasher and StatefulHasher implement Default, but you are explicit about what you want Rust will give it to you.
let hasher: StatefulHasher = Identity256::default();
hopefully this works for you :)
Hi :slightly_smiling_face:
Thank you for the answer, but this is not what I’m looking for.
I need to get a hasher from a hash code I can’t know ahead of time (see my snippet above).
The issue is precisely that we can’t use StatefulHasher except when using a specific algorithm known at compile-time like you mentioned, which kind of defeat the purpose of multihash to some extend :/
The new API is very nice when using digest is acceptable though!
I had a look. I currently see no way of doing it with the current code. The way things currently work, you cannot return a StatefulHasher based on the Code, as the StatefulHashers depend on specific Digests (please correct me if I'm wrong).
I've one idea though. Lots of the code is generated. So perhaps we could generate a companion struct to the Code enum, which implements the StatefulHasher functionality for all the Codes. That struct would that returned by a Code::hasher() call. I'm not sure if that would work, but it might be worth a try.
I had a look. I currently see no way of doing it with the current code. The way things currently work, you cannot return a
StatefulHasherbased on theCode, as theStatefulHashers depend on specificDigests (please correct me if I'm wrong).
Exact!
I've one idea though. Lots of the code is generated. So perhaps we could generate a companion struct to the
Codeenum, which implements theStatefulHasherfunctionality for all theCodes. That struct would that returned by aCode::hasher()call. I'm not sure if that would work, but it might be worth a try.
This would be really helpful!
However, it might not be very straightforward because StatefulHasher has associated types and implementing structs are using different types (because different digest size).
(because different digest size).
When you derive a Mutlihash via #[derive(Multihash)], all digests should have the same size. So at least that part should work (others may not ;)