sdk icon indicating copy to clipboard operation
sdk copied to clipboard

fromBuffer(bytes) in Dart is around 2 times slower than parseFrom(bytes) in Java

Open a14n opened this issue 5 years ago • 9 comments

When parsing PBF files from open-street-map it looks like reading PrimitiveBlock is 2 times slower in dart compare to java.

I attached a primitiveBlock.zip file containing a buffer (8497305 bytes) corresponding to a PrimitiveBlock.

Dart parses the bytes in 430ms:

final sw = Stopwatch()..start();
PrimitiveBlock.fromBuffer(bytes);
print(sw.elapsed);

Java parses the bytes in 220ms:

var start = System.currentTimeMillis();
PrimitiveBlock.parseFrom(bytes);
System.out.println("time:" + (System.currentTimeMillis() - start));

Versions:

$ dart --version
Dart SDK version: 2.10.2 (stable) (Tue Oct 13 15:50:27 2020 +0200) on "linux_x64"
$ java --version
openjdk 11.0.9 2020-10-20
OpenJDK Runtime Environment (build 11.0.9+11-Ubuntu-0ubuntu1.18.04.1)
OpenJDK 64-Bit Server VM (build 11.0.9+11-Ubuntu-0ubuntu1.18.04.1, mixed mode, sharing)

protobuf-1.0.1

a14n avatar Nov 02 '20 21:11 a14n

It's actually around 6 times slower when I measure it, if you allow JVM to warmup by calling it couple of times. JVM goes to around 100ms after the first iteration:

0 took 333 ms
1 took 106 ms
2 took 108 ms
3 took 98 ms
4 took 71 ms
5 took 70 ms
6 took 71 ms
7 took 107 ms
8 took 111 ms
9 took 108 ms

Dart pays quite some price for Int64 wrapper and also for dynamic function calls which for no good reason are used in protobuf implementation, but even after changing these things there is still around 3x difference. It seems we additionally pay the price for excessive generality of some code - which we achieve through closures. We should look at forcing enough inlining to eliminate any closure calls on the hot path (currently I can't achieve that purely through @pragma('vm:prefer-inline') so an alternative might be needed.

/cc @mkustermann

mraleph avatar Nov 12 '20 14:11 mraleph

Firstly thanks a lot for the report!

Let me transfer this issue over to dart-lang/sdk and mark it as VM issue for now. We'll investigate the performance issue.

mkustermann avatar Nov 12 '20 14:11 mkustermann

@askeksa-google Could you investigate this?

mkustermann avatar Nov 12 '20 14:11 mkustermann

Per @askeksa-google request I have put the benchmark code I had on GitHub. Dart and Java benchmarks are in osm_benchmark branch of mraleph/protobuf repo. I have also put some of the changes I played with (e.g. removing Int64 wrappers and experimenting with avoiding closures in reader in the some-performance-tweaks branch on the same repo - though these changes are not validated by tests).

mraleph avatar Nov 25 '20 09:11 mraleph

@mraleph Maybe you can upstream those changes you already made? (e.g. specialise the internal int64 (de)coding logic for VM vs non-VM)

mkustermann avatar Nov 25 '20 09:11 mkustermann

I don't see a way to upstream Int64 changes (which yield most of the improvement) as those change APIs. Changes related to declosurising reading can be upstreamed (probably) but introduce a lot of code duplication and IIRC they did not actually lead to huge improvement IIRC - hence my original comment that additional research is needed.

mraleph avatar Nov 25 '20 09:11 mraleph

Couldn't we at least change class Int64 to allow it to be as tight as _Mint box. Something like:

const bool isVm = identicial(0, 0.0);

abstract class Int64 {
  static Int64 parseInt(String s) => isVm ? VmInt64.parse(s) : WebInt64.parse(s);
}

class VmInt64 implements Int64 {
  final int _value; // <- Ensure AOT can unbox this field.   
  VmInt64(this.value);

  factory VmInt64.parse(String s) => VmInt64(int.parse(s));

  ...
}

class WebInt64 implements Int64 {
  final int _l;
  final int _m;
  final int _h;
  ...
}

and then make the protobuf decoder still use Int64 but specialize it for isVm?

mkustermann avatar Nov 25 '20 10:11 mkustermann

@askeksa-google @mraleph We should consider adding this benchmark to golem.

mkustermann avatar Nov 25 '20 10:11 mkustermann

Interesting idea, it would still box Smi-s which otherwise would be just tagged (e.g. in PbList), but might shave some of the performance difference. It will also require boxing in arguments and return values.

One thing I thought about is that we could add support for passing VmInt64 box around as value type. We could add a special annotation to it @pragma('vm:value-type') telling VM that its identity does not matter, then in the code like so:

VmInt64 foo(VmInt64 v) {
  return v + v;
}

AOT compiler could completely unbox things. The box would only need to be created when crossing types (e.g. from VmInt64 to T in List<T>). I think this is very promising and might prepare us for things like records as well.

mraleph avatar Nov 25 '20 10:11 mraleph