protobuf icon indicating copy to clipboard operation
protobuf copied to clipboard

Autosplitter for big generated java methods

Open Okapist opened this issue 2 years ago • 9 comments

Fix for https://github.com/protocolbuffers/protobuf/issues/10247

All unit tests ok with very small kMaxFieldsInMethod=4

Round trip benchmark

ByteArrayOutputStream baos = new ByteArrayOutputStream();
protoOld.writeTo(baos);

orig.protobuf_unittest.TestAnyProto.TestAny testAny =
        orig.protobuf_unittest.TestAnyProto.TestAny.parseFrom(baos.toByteArray());
blackhole.consume(testAny);


java -version
openjdk version "17.0.3" 2022-04-19
OpenJDK Runtime Environment Temurin-17.0.3+7 (build 17.0.3+7)
OpenJDK 64-Bit Server VM Temurin-17.0.3+7 (build 17.0.3+7, mixed mode, sharing)

Check small proto with 15 field. All fields inited

Benchmark      Mode  Cnt    Score   Error  Units
Test.checkNew  avgt    2  836,389          ns/op
Test.checkOld  avgt    2  933,154          ns/op

Check small proto with 15 field. 2 fields inited, 14 empty

Benchmark                    Mode  Cnt    Score   Error  Units
Test.checkNew15fields13empy  avgt    2  251,938          ns/op
Test.checkOld15fields13empy  avgt    2  334,046          ns/op

Check mid size proto. Here split code work and JIT work on old and on new code.

Benchmark              Mode  Cnt     Score   Error  Units
Test.checkNew70fields  avgt    2  1637,487          ns/op
Test.checkOld70fields  avgt    2  1621,035          ns/op

Check big proto. Around 900 fields. Here split code work and JIT not work on old code.

Benchmark                    Mode  Cnt      Score   Error  Units
TestVisit.checkNew800fields  avgt    2   6239,123          ns/op
TestVisit.checkOld800fields  avgt    2  36872,855          ns/op

I see no difference or slight increase speed in small or medium proto files and great speed increase in large proto files.

Okapist avatar Aug 05 '22 16:08 Okapist

Thanks for your pull request! It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

View this failed invocation of the CLA check for more information.

For the most up to date status, view the checks section at the bottom of the pull request.

google-cla[bot] avatar Aug 05 '22 16:08 google-cla[bot]

We will need you to sign the CLA before we can review this change.

fowles avatar Aug 05 '22 17:08 fowles

We will need you to sign the CLA before we can review this change.

CLA signed

Okapist avatar Aug 06 '22 21:08 Okapist

Just a warning that the best person to review this is out for a bit, so you should expect silence for at least a week

fowles avatar Aug 07 '22 00:08 fowles

Back in the office. Reviewing.

On Sat, Aug 6, 2022 at 6:02 PM Matt Fowles Kulukundis < @.***> wrote:

Just a warning that the best person to review this is out for a bit, so you should expect silence for at least a week

— Reply to this email directly, view it on GitHub https://github.com/protocolbuffers/protobuf/pull/10367#issuecomment-1207300625, or unsubscribe https://github.com/notifications/unsubscribe-auth/AZRRDX4WYMMW63Q6YNMAJULVX34JLANCNFSM55WVVZTA . You are receiving this because your review was requested.Message ID: @.***>

-- Jerry Berg | Software Engineer | @.*** | 720-808-1188

googleberg avatar Aug 18 '22 18:08 googleberg

@Okapist these are some intriguing results. Thank you for trying this out!

Before we go down this path of introducing complexity through splitting and continue to expand the size of generated message code, I'd like us to try comparing the performance using the Lite approach to these methods.

See GeneratedMessageLite.hashCode, .equals, .writeTo which use the Schema utility to implement these methods.

Would you be willing to take a crack at that for comparison? I have a pressing project that will likely delay me from investigating that approach myself. The goal is to balance performance with the maintenance load from having so many approaches to the problem.

googleberg avatar Aug 18 '22 19:08 googleberg

Yep. No changes. It's generates the absolutely same java code for MessageLite. There no big methods generation in lite approach.

Performance check show no difference as expected.

Benchmark                     Mode  Cnt    Score   Error  Units
Test.checkNew70fieldsWriteTo  avgt    2  810,307          ns/op
Test.checkOld70fieldsWriteTo  avgt    2  835,257          ns/op

Benchmark                       Mode  Cnt     Score   Error  Units
Test.checkNew70fieldsRoundtrip  avgt    2  1842,443          ns/op
Test.checkOld70fieldsRoundtrip  avgt    2  1968,669          ns/op

Benchmark                    Mode  Cnt  Score   Error  Units
Test.checkNew70fieldsEquals  avgt    2  0,822          ns/op
Test.checkOld70fieldsEquals  avgt    2  0,889          ns/op

Okapist avatar Aug 23 '22 10:08 Okapist

@Okapist Thanks for running this!

If I understand, these are results for running the other methods with the split in place. Which will be useful for the comparison I really want to run which is using the Lite implementation of these methods vs current impl and split impl.

We could then decide if we should introduce splitting or use the Lite implementation.

googleberg avatar Aug 23 '22 16:08 googleberg

Lite implementation is slow. Usually we use serialize and deserialize only. Lite is 20-30% slower than normal. Use normal implementation for all cases except big proto is frustrating and hard to support.

Okapist avatar Aug 29 '22 14:08 Okapist

@Okapist @googleberg Hi, I would like to know the current status of this PR. Has the ability of autosplitter been integrated? I think this feature is still very important. Big proto will seriously affect the performance of serialization and deserialization in Java. Even if you encounter a proto with too many lines and the generated java method exceeds 64k, it cannot be used at all.

ljw-hit avatar Dec 06 '23 03:12 ljw-hit