alluxio icon indicating copy to clipboard operation
alluxio copied to clipboard

Optimizing reading and writing performance

Open JulySouthWind opened this issue 2 years ago • 7 comments

As a popular distributed cache system, Alluxio has good performance in read and write scenarios with small amounts of data. However, in large-scale data read and write scenarios, Alluxio's performance is somewhat unsatisfactory. We use Netty instead of GRPC as data transmitting underlying architecture to improve the performance of data reading and writing. The data serialization of pb consumes too much cpu time, we take advantage zero-copy of off-heap memory to send and receive data, striping data byte from pb messages. In addition, we propose a customized thread pool model which to assure pressing sequentially for one file or data block. The detailed design is shown in the following document: https://docs.google.com/document/d/1A63xwD_vQtsBYg2AG2iN6q2B0NMMk0how8cD75EurAE/edit#

JulySouthWind avatar Jun 15 '22 07:06 JulySouthWind

mark

jja725 avatar Jun 15 '22 18:06 jja725

Arccording to my understanding, do you want to remove grpc framework from alluxio? It will exert a tremendous influence to Alluxio.

boobpoop avatar Jun 28 '22 03:06 boobpoop

Firstly, we want to replace grpc only under data reading and writing scenario. It will take a long time to replace grpc in all scenario, whether to replace grpc depends on whether there is beneficially

JulySouthWind avatar Jun 28 '22 03:06 JulySouthWind

Firstly, we want to replace grpc only under data reading and writing scenario. It will take a long time to replace grpc in all scenario, whether to replace grpc depends on whether there is beneficially

Sounds good to me, feel free to post a PR, @rongrong and I will take a look very soon.

beinan avatar Jun 28 '22 17:06 beinan

BTW, would you be willing to share more details on how your benchmark works? I'm interested in reproducing the performance issues you're trying to address. Thanks very much.

YangchenYe323 avatar Jul 06 '22 04:07 YangchenYe323

This is the stress test code: https://github.com/oppo-bigdata/shuttle/blob/master/src/main/scala/org/apache/spark/testutil/FsWriteStressTest.scala https://github.com/oppo-bigdata/shuttle/blob/master/src/main/scala/org/apache/spark/testutil/FsReadStressTest.scala

JulySouthWind avatar Jul 06 '22 08:07 JulySouthWind

This is the stress test code: https://github.com/oppo-bigdata/shuttle/blob/master/src/main/scala/org/apache/spark/testutil/FsWriteStressTest.scala https://github.com/oppo-bigdata/shuttle/blob/master/src/main/scala/org/apache/spark/testutil/FsReadStressTest.scala

What is the maximum qps on single machine performance you have tested? I only read metadata from master , qps just 18w

tokingHong avatar Sep 21 '22 12:09 tokingHong

This issue has been automatically marked as stale because it has not had recent activity. It will be closed in two weeks if no further activity occurs. Thank you for your contributions.

github-actions[bot] avatar Jan 31 '23 15:01 github-actions[bot]