protobuf
protobuf copied to clipboard
Autosplitter for big generated java methods
Fix for https://github.com/protocolbuffers/protobuf/issues/10247
All unit tests ok with very small kMaxFieldsInMethod=4
Round trip benchmark
ByteArrayOutputStream baos = new ByteArrayOutputStream();
protoOld.writeTo(baos);
orig.protobuf_unittest.TestAnyProto.TestAny testAny =
orig.protobuf_unittest.TestAnyProto.TestAny.parseFrom(baos.toByteArray());
blackhole.consume(testAny);
java -version
openjdk version "17.0.3" 2022-04-19
OpenJDK Runtime Environment Temurin-17.0.3+7 (build 17.0.3+7)
OpenJDK 64-Bit Server VM Temurin-17.0.3+7 (build 17.0.3+7, mixed mode, sharing)
Check small proto with 15 field. All fields inited
Benchmark Mode Cnt Score Error Units
Test.checkNew avgt 2 836,389 ns/op
Test.checkOld avgt 2 933,154 ns/op
Check small proto with 15 field. 2 fields inited, 14 empty
Benchmark Mode Cnt Score Error Units
Test.checkNew15fields13empy avgt 2 251,938 ns/op
Test.checkOld15fields13empy avgt 2 334,046 ns/op
Check mid size proto. Here split code work and JIT work on old and on new code.
Benchmark Mode Cnt Score Error Units
Test.checkNew70fields avgt 2 1637,487 ns/op
Test.checkOld70fields avgt 2 1621,035 ns/op
Check big proto. Around 900 fields. Here split code work and JIT not work on old code.
Benchmark Mode Cnt Score Error Units
TestVisit.checkNew800fields avgt 2 6239,123 ns/op
TestVisit.checkOld800fields avgt 2 36872,855 ns/op
I see no difference or slight increase speed in small or medium proto files and great speed increase in large proto files.
Thanks for your pull request! It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).
View this failed invocation of the CLA check for more information.
For the most up to date status, view the checks section at the bottom of the pull request.
We will need you to sign the CLA before we can review this change.
We will need you to sign the CLA before we can review this change.
CLA signed
Just a warning that the best person to review this is out for a bit, so you should expect silence for at least a week
Back in the office. Reviewing.
On Sat, Aug 6, 2022 at 6:02 PM Matt Fowles Kulukundis < @.***> wrote:
Just a warning that the best person to review this is out for a bit, so you should expect silence for at least a week
— Reply to this email directly, view it on GitHub https://github.com/protocolbuffers/protobuf/pull/10367#issuecomment-1207300625, or unsubscribe https://github.com/notifications/unsubscribe-auth/AZRRDX4WYMMW63Q6YNMAJULVX34JLANCNFSM55WVVZTA . You are receiving this because your review was requested.Message ID: @.***>
-- Jerry Berg | Software Engineer | @.*** | 720-808-1188
@Okapist these are some intriguing results. Thank you for trying this out!
Before we go down this path of introducing complexity through splitting and continue to expand the size of generated message code, I'd like us to try comparing the performance using the Lite approach to these methods.
See GeneratedMessageLite.hashCode, .equals, .writeTo which use the Schema utility to implement these methods.
Would you be willing to take a crack at that for comparison? I have a pressing project that will likely delay me from investigating that approach myself. The goal is to balance performance with the maintenance load from having so many approaches to the problem.
Yep. No changes. It's generates the absolutely same java code for MessageLite. There no big methods generation in lite approach.
Performance check show no difference as expected.
Benchmark Mode Cnt Score Error Units
Test.checkNew70fieldsWriteTo avgt 2 810,307 ns/op
Test.checkOld70fieldsWriteTo avgt 2 835,257 ns/op
Benchmark Mode Cnt Score Error Units
Test.checkNew70fieldsRoundtrip avgt 2 1842,443 ns/op
Test.checkOld70fieldsRoundtrip avgt 2 1968,669 ns/op
Benchmark Mode Cnt Score Error Units
Test.checkNew70fieldsEquals avgt 2 0,822 ns/op
Test.checkOld70fieldsEquals avgt 2 0,889 ns/op
@Okapist Thanks for running this!
If I understand, these are results for running the other methods with the split in place. Which will be useful for the comparison I really want to run which is using the Lite implementation of these methods vs current impl and split impl.
We could then decide if we should introduce splitting or use the Lite implementation.
Lite implementation is slow. Usually we use serialize and deserialize only. Lite is 20-30% slower than normal. Use normal implementation for all cases except big proto is frustrating and hard to support.
@Okapist @googleberg Hi, I would like to know the current status of this PR. Has the ability of autosplitter been integrated? I think this feature is still very important. Big proto will seriously affect the performance of serialization and deserialization in Java. Even if you encounter a proto with too many lines and the generated java method exceeds 64k, it cannot be used at all.