incubator-uniffle icon indicating copy to clipboard operation
incubator-uniffle copied to clipboard

[Improvement] Introduce local allocation buffer to store blocks in memory

Open xianjingfeng opened this issue 1 year ago • 2 comments

Code of Conduct

Search before asking

  • [X] I have searched in the issues and found no similar issues.

What would you like to be improved?

Currently we have put the shuffle data into the off-heap memory in shuffle server . But I found it still occupancy a lot of heap memory. The following is the result of printing by using jmap -histo.

   1:     189601376    16684921088  io.netty.buffer.UnpooledByteBufAllocator$InstrumentedUnpooledUnsafeDirectByteBuf
   2:     189860728    15188858240  java.nio.DirectByteBuffer ([email protected])
   3:     189605871    13651622712  jdk.internal.ref.Cleaner ([email protected])
   4:     189018520    10585037120  org.apache.uniffle.common.ShufflePartitionedBlock
   5:     189605871     7584234840  java.nio.DirectByteBuffer$Deallocator ([email protected])

From the above results, we can see that the main reason for high memory usage is that there are too many blocks. And the reason why there are so many blocks is because the blocks are very small.

How should we improve?

Introduce local allocation buffer like MSLAB in Hbase. Refer: https://hbase.apache.org/book.html#gcpause

Are you willing to submit PR?

  • [X] Yes I am willing to submit a PR!

xianjingfeng avatar May 21 '24 02:05 xianjingfeng

@jerqi @zuston @advancedxy @rickyma PTAL. I'm quite busy recently. If anyone interested in it, welcome to pick it up.

xianjingfeng avatar May 21 '24 02:05 xianjingfeng

This issue seems feasible. I'll take a look first. We need this too.

Currently, there are a few things that we can do to make blocks smaller:

  1. Set spark.rss.writer.buffer.spill.size to a higher value to make blocks larger, e.g. 1g or 2g.
  2. Set rss.client.memory.spill.ratio less than 0.5, e.g. 0.3, let larger blocks spill first.
  3. Set spark.rss.writer.buffer.size to a larger value refer to https://github.com/apache/incubator-uniffle/issues/1594#issuecomment-2081378887, e.g. 10m.

rickyma avatar May 21 '24 07:05 rickyma