crypto icon indicating copy to clipboard operation
crypto copied to clipboard

Abysmal performance for hash algorithms in dart:crypto

Open DartBot opened this issue 10 years ago • 13 comments

Originally opened as dart-lang/sdk#4611

This issue was originally filed by [email protected]


Test code: new File('recordroyale.ogg').readAsBytes().then((buf) {   var md5 = new MD5();   md5.update(buf);   print(md5.digest()); });

where 'recordroyale.ogg' is about six megabytes. This takes 22 seconds on my machine. Shelling out to md5sum(1) takes under 0.02 seconds. I understand that Dart won't ever be as fast as C, but with this disparity, the hash algorithms should be implemented in C[++] rather than Dart.

I had a vague suspicion that this was slow due to unnecessary allocations, but the allocations are mostly small and consumed quickly. Refactoring to eliminate unnecessary allocations resulted in no change.

My use case involved checksumming a byte array that existed only in memory at the relevant location; it is far less convenient to write data to a temporary file, shell out to md5sum(1), then delete that file.

DartBot avatar Jun 05 '15 22:06 DartBot

<img src="https://avatars.githubusercontent.com/u/5858078?v=3" align="left" width="48" height="48"hspace="10"> Comment by larsbak


As a starting point we should add a benchmark to trace performance of MD5:

#import("dart:io"); #import("dart:crypto");

main() {   new File('data').readAsBytes().then((buf) {     var md5 = new MD5();     md5.update(buf);     print(md5.digest());   }); }


Set owner to @madsager. Added Area-Library label.

DartBot avatar Jun 05 '15 22:06 DartBot

<img src="https://avatars.githubusercontent.com/u/2909286?v=3" align="left" width="48" height="48"hspace="10"> Comment by madsager


There is a lot that can be done to improve performance here. The data is being copied around way too much. Also, the code overflows smi range and (at least it used to be the case that) a lot of the medium-size integer operations ended up in runtime calls.


cc @iposva-google. cc @sgmitrovic.

DartBot avatar Jun 05 '15 22:06 DartBot

<img src="https://avatars.githubusercontent.com/u/5449880?v=3" align="left" width="48" height="48"hspace="10"> Comment by iposva-google


Mads, let's work together at creating a repeatable test case. I don't think file reading is really needed here.

DartBot avatar Jun 05 '15 22:06 DartBot

<img src="https://avatars.githubusercontent.com/u/4865287?v=3" align="left" width="48" height="48"hspace="10"> Comment by lrhn


Removed Area-Library label. Added Area-Pkg, Library-Crypto labels.

DartBot avatar Jun 05 '15 22:06 DartBot

<img src="https://avatars.githubusercontent.com/u/5449880?v=3" align="left" width="48" height="48"hspace="10"> Comment by iposva-google


I wrote a little test just now and it looks like we can close this bug. Abysmal is certainly not what we see anymore:

import "dart:io"; import "package:crypto/crypto.dart";

readFileFully(var path) {   return new File(path).readAsBytesSync(); }

main() {   var binary = new Options().executable;   print("Using $binary");   for (int i = 0; i < 3; i++) {     var stopwatch = new Stopwatch()..start();     var bytes = readFileFully(binary);     stopwatch.stop();     print("Read ${bytes.length} bytes in ${stopwatch.elapsedMilliseconds} ms.");

    // Now compute the MD5 hash on the read bytes.     stopwatch = new Stopwatch()..start();     var md5 = new MD5();     md5.add(bytes);     var hash = md5.close();     stopwatch.stop();     print("Hashed ${bytes.length} bytes into ${hash.length} bytes in "           "${stopwatch.elapsedMilliseconds} ms.");     var sb = new StringBuffer();     for (int j = 0; j < hash.length; j++) {       var val = hash[j].toRadixString(16);       sb.write(val.length == 1 ? "0$val" : val);     }     print("Hash: $sb");   } }

Gives this set of results on my MacBook:

dalbe[runtime] ./xcodebuild/ReleaseX64/dart --package-root=./xcodebuild/ReleaseIA32/packages/ ~/dart/Bug4611.dart Using ./xcodebuild/ReleaseX64/dart Read 9875196 bytes in 66 ms. Hashed 9875196 bytes into 16 bytes in 810 ms. Hash: e5f175b851495b54b393327ac934583a Read 9875196 bytes in 18 ms. Hashed 9875196 bytes into 16 bytes in 802 ms. Hash: e5f175b851495b54b393327ac934583a Read 9875196 bytes in 33 ms. Hashed 9875196 bytes into 16 bytes in 816 ms. Hash: e5f175b851495b54b393327ac934583a

dalbe[runtime] time md5 ./xcodebuild/ReleaseX64/dart MD5 (./xcodebuild/ReleaseX64/dart) = e5f175b851495b54b393327ac934583a 0.026u 0.007s 0:00.02 100.0% 0+0k 0+1io 0pf+0w

This means we calculate the MD5 hash for a 9.41MB file in 810ms, which is significantly better than the reported 6MB in 22 seconds.


Added AssumedStale label.

DartBot avatar Jun 05 '15 22:06 DartBot

<img src="https://avatars.githubusercontent.com/u/17034?v=3" align="left" width="48" height="48"hspace="10"> Comment by kevmoo


Removed Library-Crypto label. Added Pkg-Crypto label.

DartBot avatar Jun 05 '15 22:06 DartBot

This may be a slightly contrived example, but I was recently comparing dart performance to python performance on a set of 'programming challenges' and am still seeing VERY weak performance from dart's MD5 hashing. the following code computes about 10 million hashes, and takes roughly 18-20 seconds on my macbook, whereas the equivalent python code (using hashlib) runs in under 5 second.

import "dart:io";
import "package:crypto/crypto.dart";

main(List<String> args) {
  MD5 md5;
  String text;
  String privKey = "yzbqklnj";
  String digest = "";
  List<int> bytes;
  int counter = 0;
  while (!digest.startsWith("000000")) {
    counter ++;
    text = privKey + counter.toString();
    md5 = new MD5();
    md5.add(text.codeUnits);
    bytes = md5.close();
    digest = CryptoUtils.bytesToHex(bytes);
  }
  print(counter);
}

of course hashlib is implemented in C, so the speed is to be expected, but it also clearly demonstrates that dart is nowhere near the same ballpark without implementing this stuff in C.

andreleblanc-wf avatar Dec 04 '15 22:12 andreleblanc-wf

Thanks for the report. Would you mind providing the version of Dart that you used? Did you have checked mode turned on? And, what version of crypto?

sethladd avatar Dec 04 '15 23:12 sethladd

I actually just updated my comment, seems there was an issue with my initial python run, and the real number is closer to 5 seconds vs. dart at 20 seconds. this is using dart 1.13.0. The numbers are a little more reasonable now, but it's still a significant performance hit.

andreleblanc-wf avatar Dec 05 '15 05:12 andreleblanc-wf

Thanks for the additional comments. I'm not sure this issue should be closed. Reopening.

cc @sgmitrovic if he's curious.

sethladd avatar Dec 07 '15 16:12 sethladd

I see that *_MD5Sink.updateHash allocates millions on _Mint objects, which causes GC, and probably slow performance. I compute MD5 of a fairly small list of bytes, and MD5 computation takes almost 50% of total time in my app. Why isn't it inlined to use unboxed values?!

scheglov avatar Oct 21 '16 03:10 scheglov

Is this issue still being worked on? I'm still experiencing ~10x slower performance on MD5 compared to the native shell command line for large files (in my example, 121MB).

On my laptop (Ubuntu 19.10):

$ dart2native dart_md5.dart -o dart_md5
$ time dart_md5 video.mpg
54cd56fa1fcf144b21d48357e4e694a3

real	0m1,498s
user	0m1,584s
sys	0m0,038s

$ time md5sum video.mpg
54cd56fa1fcf144b21d48357e4e694a3  video.mpg

real	0m0,191s
user	0m0,187s
sys	0m0,004s

On my Raspberry Pi (raspbian):

$ dart2native dart_md5.dart -o dart_md5
$ time dart_md5 video.mpg
54cd56fa1fcf144b21d48357e4e694a3

real	0m7.923s
user	0m7.972s
sys	0m0.181s

$ time md5sum video.mpg
54cd56fa1fcf144b21d48357e4e694a3  video.mpg

real	0m0.646s
user	0m0.514s
sys	0m0.133s

Here's the relevant part of my code:

  static Future<String> _hash(File file) async =>
      (await md5.bind(file.openRead()).first).toString();

Dart VM version: 2.7.0

cachapa avatar Dec 15 '19 13:12 cachapa