alluxio
alluxio copied to clipboard
Add benchmark for ReadResponseMarshaller
What changes are proposed in this pull request?
Added a benchmark for the marshalling/unmarshalling performance of ReadResponseMarshaller
in comparison with the baseline marshaller implementation MesasgeMarshaller
Why are the changes needed?
Shed light on the performance characteristics of alluxio's zero-copy implementation
Does this PR introduce any user facing changes?
No
data:image/s3,"s3://crabby-images/87157/8715760ce124a6a6594c602b87635e797b4c960e" alt="Screen Shot 2022-06-29 at 8 31 35 PM"
It does improved a lot over the read
method on the baseline implementation, which sonstructs a new ByteArrayInputStream
over the protobuf entity. However, drainTo
in the baseline implementation does not have this copy in the first place.
data:image/s3,"s3://crabby-images/3bd0b/3bd0bcd3c6a1b750a36c67ddda08f01be91dcddf" alt="Screen Shot 2022-06-29 at 8 49 51 PM"
I've run some experimental grpc read benchmarks using https://github.com/TachyonNexus/spark-dfsio on my macbook. The result suggests that there's no significant performance gain in using zero-copy either for worker or for client.
- Hardware: 2.3 GHz 8-Core Intel Core i9, 16 GB 2667 MHz DDR4
- Alluxio version: current master
- Spark version: 3.3.0
- Job command:
./bin/spark-submit \
--class alluxio.benchmarks.TestDFSIO \
--conf "spark.scheduler.minRegisteredResourcesRatio=1" \
--conf "spark.scheduler.maxRegisteredResourcesWaitingTime=60s" \
--conf "spark.executor.extraJavaOptions=-Dalluxio.user.block.size.bytes.default=128MB -Dalluxio.user.file.readtype.default=NO_CACHE -Dalluxio.user.file.writetype.default=MUST_CACHE -Dalluxio.user.short.circuit.enabled=false -Dalluxio.user.streaming.zerocopy.enabled=false" \
benchmarks-1.0.0-SNAPSHOT-jar-with-dependencies.jar -p 4 -s 1000 -o wr -b alluxio://192.168.2.20:19998/testdfsio/
Following are test results under different configurations:
Short Circuit Enabled | Client ZeroCopy Enabled | Worker ZeroCopy Enabled | Read Throughput (MB/s) | Write Throughput (MB/s) |
---|---|---|---|---|
Yes | - | - | ~266 | ~591 |
No | No | No | ~250 | ~455 |
No | Yes | Yes | ~278 | ~482 |
No | No | Yes | ~288 | ~496 |
Though it's not a statistically solid test, my feeling is that the variations are normal fluctuations rather than correlated with the zero copy implementation.
Re-runed with ByteArrayOutputStream
as final consumer of serialized stream. It can be seen that marshalZeroCopy
is still not better than marshalBaselineDrain
data:image/s3,"s3://crabby-images/306fe/306fe28a589a80eee36c515f818f1e0a237ccfeb" alt="Screen Shot 2022-08-17 at 5 01 29 PM"
After a second look into the code, I realized I've made a big mistake in this benchmark so the performance gain cannot be seen: DataMessageMarshaller
is designed to work with a specific OutputStream, i.e., BufferChainOutputStream
in grpc-java's internal package. This stream keeps track of a list of buffer references, and DataMessageMarshaller
avoids copy by appending the buffer reference it contains directly to that internal list of BufferChainOutputStream
.