spring-framework icon indicating copy to clipboard operation
spring-framework copied to clipboard

Improve performance of ProtobufHttpMessageConverter

Open koo-taejin opened this issue 2 years ago • 2 comments

Improved performance for ProtobufHttpMessageConverter

Currently, when a binary message in the form of ProtoBuf is received, data is created using the Builder via ProtobufHttpMessageConverter. When Protobuf creating individual com.google.protobuf.Message. it creates a PROTO_MESSAGE parseFrom(java.io.InputStream input, com.google.protobuf.ExtensionRegistryLite extensionRegistry) method If use this method, ProtobufHttpMessageConverter can reduce creating unnecessary Builder and executing merge method process. (And I have confirmed that grpc-java use this method when creating 'com.google.protobuf.Message'.)

If change the object creation to the parseFrom method, I made a simple test and confirmed that the TPS increased by about 5% in several simple performance tests. (I have confirmed that the slope of increasing heap improved.)

  • as-is
h2load http://localhost:8080/user/proto -d ./proto.txt --header 'Content-Type: application/x-protobuf;charset=UTF-8'  -n 100000 -c 1 -m 1
starting benchmark...
spawning thread #0: 1 total client(s). 100000 total requests
Application protocol: h2c
progress: 10% done
progress: 20% done
progress: 30% done
progress: 40% done
progress: 50% done
progress: 60% done
progress: 70% done
progress: 80% done
progress: 90% done
progress: 100% done

finished in 24.07s, 4155.29 req/s, 146.15KB/s
requests: 100000 total, 100000 started, 100000 done, 100000 succeeded, 0 failed, 0 errored, 0 timeout
status codes: 100000 2xx, 0 3xx, 0 4xx, 0 5xx
traffic: 3.43MB (3601612) total, 488.93KB (500660) headers (space savings 96.26%), 390.63KB (400000) data
                     min         max         mean         sd        +/- sd
time for request:      187us     13.12ms       234us       105us    94.49%
time for connect:      301us       301us       301us         0us   100.00%
time to 1st byte:     2.11ms      2.11ms      2.11ms         0us   100.00%
req/s           :    4155.31     4155.31     4155.31        0.00   100.00%

as-is

  • to-be
h2load http://localhost:8080/user/proto -d ./proto.txt --header 'Content-Type: application/x-protobuf;charset=UTF-8'  -n 100000 -c 1 -m 1
starting benchmark...
spawning thread #0: 1 total client(s). 100000 total requests
Application protocol: h2c
progress: 10% done
progress: 20% done
progress: 30% done
progress: 40% done
progress: 50% done
progress: 60% done
progress: 70% done
progress: 80% done
progress: 90% done
progress: 100% done

finished in 23.31s, 4289.18 req/s, 150.86KB/s
requests: 100000 total, 100000 started, 100000 done, 100000 succeeded, 0 failed, 0 errored, 0 timeout
status codes: 100000 2xx, 0 3xx, 0 4xx, 0 5xx
traffic: 3.43MB (3601566) total, 488.88KB (500614) headers (space savings 96.26%), 390.63KB (400000) data
                     min         max         mean         sd        +/- sd
time for request:      177us    274.17ms       227us       876us    99.97%
time for connect:      331us       331us       331us         0us   100.00%
time to 1st byte:   254.55ms    254.55ms    254.55ms         0us   100.00%
req/s           :    4289.20     4289.20     4289.20        0.00   100.00%

to-be

Rather than using a builder, there is a risk for method presence, so I made a map for present method.

Please feel free to let me know if I misjudged anything. :)

koo-taejin avatar Nov 16 '22 03:11 koo-taejin

@koo-taejin Please sign the Contributor License Agreement!

Click here to manually synchronize the status of this Pull Request.

See the FAQ for frequently asked questions.

pivotal-cla avatar Nov 16 '22 03:11 pivotal-cla

@koo-taejin Thank you for signing the Contributor License Agreement!

pivotal-cla avatar Nov 16 '22 03:11 pivotal-cla

Thanks for this contribution, and sorry for getting back to you so long after. I've introduced a JMH benchmark in 31a62ff8ba for the converter (both for reading and writing operations). Maybe the runtime profile changed since your PR, but I'm not seeing improvements with your change (it might have made things a bit worse actually).

Running with java -jar spring-web/build/libs/spring-web-6.1.0-SNAPSHOT-jmh.jar -t 30 -f 2 -prof gc ProtobufHttpMessageConverterBenchmark

Without this change:

Benchmark                                                                (messageCount)   Mode  Cnt       Score     Error   Units
ProtobufHttpMessageConverterBenchmark.readMessages                                   40  thrpt   10   66492.262 ± 397.655   ops/s
ProtobufHttpMessageConverterBenchmark.readMessages:·gc.alloc.rate                    40  thrpt   10   12079.125 ±  72.156  MB/sec
ProtobufHttpMessageConverterBenchmark.readMessages:·gc.alloc.rate.norm               40  thrpt   10  190556.022 ± 758.571    B/op
ProtobufHttpMessageConverterBenchmark.readMessages:·gc.count                         40  thrpt   10    1645.000            counts
ProtobufHttpMessageConverterBenchmark.readMessages:·gc.time                          40  thrpt   10    1412.000                ms
ProtobufHttpMessageConverterBenchmark.writeMessages                                  40  thrpt   10   48658.466 ± 715.484   ops/s
ProtobufHttpMessageConverterBenchmark.writeMessages:·gc.alloc.rate                   40  thrpt   10   12076.030 ± 195.945  MB/sec
ProtobufHttpMessageConverterBenchmark.writeMessages:·gc.alloc.rate.norm              40  thrpt   10  260480.029 ±   0.001    B/op
ProtobufHttpMessageConverterBenchmark.writeMessages:·gc.count                        40  thrpt   10    1791.000            counts
ProtobufHttpMessageConverterBenchmark.writeMessages:·gc.time                         40  thrpt   10    1427.000                ms

With this change:

Benchmark                                                                (messageCount)   Mode  Cnt       Score      Error   Units
ProtobufHttpMessageConverterBenchmark.readMessages                                   40  thrpt   10   64359.735 ± 1712.806   ops/s
ProtobufHttpMessageConverterBenchmark.readMessages:·gc.alloc.rate                    40  thrpt   10   11660.432 ±  306.253  MB/sec
ProtobufHttpMessageConverterBenchmark.readMessages:·gc.alloc.rate.norm               40  thrpt   10  190072.022 ±   12.749    B/op
ProtobufHttpMessageConverterBenchmark.readMessages:·gc.count                         40  thrpt   10    1996.000             counts
ProtobufHttpMessageConverterBenchmark.readMessages:·gc.time                          40  thrpt   10    1664.000                 ms
ProtobufHttpMessageConverterBenchmark.writeMessages                                  40  thrpt   10   48545.755 ±  362.536   ops/s
ProtobufHttpMessageConverterBenchmark.writeMessages:·gc.alloc.rate                   40  thrpt   10   12052.676 ±   90.700  MB/sec
ProtobufHttpMessageConverterBenchmark.writeMessages:·gc.alloc.rate.norm              40  thrpt   10  260512.030 ±    0.001    B/op
ProtobufHttpMessageConverterBenchmark.writeMessages:·gc.count                        40  thrpt   10    2296.000             counts
ProtobufHttpMessageConverterBenchmark.writeMessages:·gc.time                         40  thrpt   10    1743.000                 ms

In light of that, I'm declining this PR for now. We can always consider other changes if they're backed by benchmark results. Thanks!

bclozel avatar Oct 20 '23 14:10 bclozel