dmd
dmd copied to clipboard
replace usages of md5 with blake3
Add an implementation of Blake3 cryptographic hash function to replace MD5. The default hash length for blake3 is 32 bytes rather than 16, so I use a truncated version of the hash to keep the lengths the same as they were with MD5. Since Blake3 is designed to produces hashes of any desired length, I'm assuming that a shorter than default hash is at least a collision resistant as md5 (though I didn't find a guarantee of this)
The implementation here is a bit simplified from the reference implementation because it doesn't support streaming (the API is just ubyte[32] blake3(ubyte[])).
I'm assuming there are consequences of this change that I'm not aware of and hence not fixing here, so LMK what else needs to be fixed.
The rationale for this change is that MD5 is considered insecure with known weaknesses. While these are probably not exploitable in the compiler, I think it's a good PR move to dump MD5. Also, blake3 is supposed to be faster than MD5, though I haven't benchmarked this change (and some of the speedups come from being able to parallelize hashing big inputs which won't be a concern in the compiler).
Also, I'm on an ARM Mac so can't test this locally (yet, I'll work that early next week).
I assume I also need to add some copyright stuff, etc.
Thanks for your pull request and interest in making D better, @benjones! We are looking forward to reviewing it, and you should be hearing from a maintainer soon. Please verify that your PR follows this checklist:
- My PR is fully covered with tests (you can see the coverage diff by visiting the details link of the codecov check)
- My PR is as minimal as possible (smaller, focused PRs are easier to review than big ones)
- I have provided a detailed rationale explaining my changes
- New or modified functions have Ddoc comments (with
Params:andReturns:)
Please see CONTRIBUTING.md for more information.
If you have addressed all reviews or aren't sure how to proceed, don't hesitate to ping us with a simple comment.
Bugzilla references
Your PR doesn't reference any Bugzilla issue.
If your PR contains non-trivial changes, please reference a Bugzilla issue or create a manual changelog.
Testing this PR locally
If you don't have a local development environment setup, you can use Digger to test this PR:
dub run digger -- build "master + dmd#16175"
It would be nice to run a benchmark by compiling s huge project, to see the benefits
It would be nice to run a benchmark by compiling s huge project, to see the benefits
Agree, I'll try to run something on a linux box sometime this week
I don't understand this compiler error in the CI stuff? It builds/runs fine using ldc locally. Where is there an attempted implicit cast to uint?
ulong cursor = 0;
//greater because if it's == we still need to finalize the last
//chunk
while(data.length - cursor > ChunkLength){
const ubyte[] chunk = data[cursor .. cursor + ChunkLength];
src/dmd/common/blake3.d(34): Error: cannot implicitly convert expression `cursor` of type `ulong` to `uint`
src/dmd/common/blake3.d(34): Error: cannot implicitly convert expression `cursor + 1024LU` of type `ulong` to `uint`
Where is there an attempted implicit cast to uint?
It fails on x86, so when .length is 32 bits, not 64. Use size_t for cursor ?
cc @WalterBright
It's a busy week, so doubt I'll get a chance to do the experiment for a while. I agree that I doubt there will be a measurable effect, but agree it would be good to confirm that.
OK, I ran a quick experiment. Using master, compiled with dmd-2.107.0 and my branch compiled with dmd-2.107.0 I timed building phobos on a linux machine. My branch took 21.33s at best, master took 21.13s at best. Not super thorough, but I think this shows that this change has negligible impact on build times.
ping