incubator-uniffle icon indicating copy to clipboard operation
incubator-uniffle copied to clipboard

[Bug] Too high SUnreclaim memery use and does not release memory

Open lifeSo opened this issue 1 year ago • 2 comments

Code of Conduct

Search before asking

  • [X] I have searched in the issues and found no similar issues.

Describe the bug

We launch 8 Shuffle Server machine, and there is one machine memory use is very high. After diagnose, we find there is SUnreclaim too high by run cmd :/etc/meminfo image

Event though RSS process is down, the memory is not released.

image

Affects Version(s)

0.7.0

Uniffle Server Log Output

No response

Uniffle Engine Log Output

No response

Uniffle Server Configurations

No response

Uniffle Engine Configurations

No response

Additional context

No response

Are you willing to submit PR?

  • [ ] Yes I am willing to submit a PR!

lifeSo avatar Jan 24 '24 06:01 lifeSo

Can you run jmap -histo:live to show what occupies the memory when the process of RSS Server is still alive? I think it will probably be io.netty.buffer.ReadOnlyByteBufferBuf from org.apache.uniffle.common.ShufflePartitionedBlock.

rickyma avatar Jan 24 '24 11:01 rickyma

@rickyma We don't run the jmap -histo:live cmd, because we think the rss use memory is ok, it just use half of the memory from the pic above. And the machine restarted later. If the problem show next time, I think could run slabtop

lifeSo avatar Jan 25 '24 02:01 lifeSo