dmd icon indicating copy to clipboard operation
dmd copied to clipboard

replace usages of md5 with blake3

Open benjones opened this issue 1 year ago • 6 comments

Add an implementation of Blake3 cryptographic hash function to replace MD5. The default hash length for blake3 is 32 bytes rather than 16, so I use a truncated version of the hash to keep the lengths the same as they were with MD5. Since Blake3 is designed to produces hashes of any desired length, I'm assuming that a shorter than default hash is at least a collision resistant as md5 (though I didn't find a guarantee of this)

The implementation here is a bit simplified from the reference implementation because it doesn't support streaming (the API is just ubyte[32] blake3(ubyte[])).

I'm assuming there are consequences of this change that I'm not aware of and hence not fixing here, so LMK what else needs to be fixed.

The rationale for this change is that MD5 is considered insecure with known weaknesses. While these are probably not exploitable in the compiler, I think it's a good PR move to dump MD5. Also, blake3 is supposed to be faster than MD5, though I haven't benchmarked this change (and some of the speedups come from being able to parallelize hashing big inputs which won't be a concern in the compiler).

Also, I'm on an ARM Mac so can't test this locally (yet, I'll work that early next week).

I assume I also need to add some copyright stuff, etc.

benjones avatar Feb 11 '24 04:02 benjones

Thanks for your pull request and interest in making D better, @benjones! We are looking forward to reviewing it, and you should be hearing from a maintainer soon. Please verify that your PR follows this checklist:

  • My PR is fully covered with tests (you can see the coverage diff by visiting the details link of the codecov check)
  • My PR is as minimal as possible (smaller, focused PRs are easier to review than big ones)
  • I have provided a detailed rationale explaining my changes
  • New or modified functions have Ddoc comments (with Params: and Returns:)

Please see CONTRIBUTING.md for more information.


If you have addressed all reviews or aren't sure how to proceed, don't hesitate to ping us with a simple comment.

Bugzilla references

Your PR doesn't reference any Bugzilla issue.

If your PR contains non-trivial changes, please reference a Bugzilla issue or create a manual changelog.

Testing this PR locally

If you don't have a local development environment setup, you can use Digger to test this PR:

dub run digger -- build "master + dmd#16175"

dlang-bot avatar Feb 11 '24 04:02 dlang-bot

It would be nice to run a benchmark by compiling s huge project, to see the benefits

ryuukk avatar Feb 11 '24 11:02 ryuukk

It would be nice to run a benchmark by compiling s huge project, to see the benefits

Agree, I'll try to run something on a linux box sometime this week

benjones avatar Feb 11 '24 15:02 benjones

I don't understand this compiler error in the CI stuff? It builds/runs fine using ldc locally. Where is there an attempted implicit cast to uint?


 ulong cursor = 0;

 //greater because if it's == we still need to finalize the last
 //chunk
 while(data.length - cursor > ChunkLength){

     const ubyte[] chunk = data[cursor .. cursor + ChunkLength];
src/dmd/common/blake3.d(34): Error: cannot implicitly convert expression `cursor` of type `ulong` to `uint`
src/dmd/common/blake3.d(34): Error: cannot implicitly convert expression `cursor + 1024LU` of type `ulong` to `uint`

benjones avatar Feb 11 '24 16:02 benjones

Where is there an attempted implicit cast to uint?

It fails on x86, so when .length is 32 bits, not 64. Use size_t for cursor ?

Geod24 avatar Feb 11 '24 17:02 Geod24

cc @WalterBright

RazvanN7 avatar Feb 13 '24 16:02 RazvanN7

It's a busy week, so doubt I'll get a chance to do the experiment for a while. I agree that I doubt there will be a measurable effect, but agree it would be good to confirm that.

benjones avatar Feb 20 '24 17:02 benjones

OK, I ran a quick experiment. Using master, compiled with dmd-2.107.0 and my branch compiled with dmd-2.107.0 I timed building phobos on a linux machine. My branch took 21.33s at best, master took 21.13s at best. Not super thorough, but I think this shows that this change has negligible impact on build times.

benjones avatar Feb 28 '24 18:02 benjones

ping

benjones avatar Mar 11 '24 19:03 benjones