incubator-uniffle
incubator-uniffle copied to clipboard
[Problem] The shuffle server memory not release
uniffle version: 0.6.0, I deploy on k8s,but the shuffle server memory not release ,even my spark application is stop it does not release too my server config is below:
rss.rpc.server.port 19999 rss.jetty.http.port 19998 rss.rpc.executor.size 2000 rss.storage.type MEMORY_LOCALFILE_HDFS rss.coordinator.quorum 10.100.41.162:19999 rss.server.disk.capacity 50g rss.storage.basePath /home/data rss.server.flush.thread.alive 1 rss.server.flush.threadPool.size 10 rss.server.buffer.capacity 4g rss.server.read.buffer.capacity 2g rss.server.heartbeat.timeout 60000 rss.server.heartbeat.interval 10000 rss.rpc.message.max.size 1073741824 rss.server.preAllocation.expired 120000 rss.server.commit.timeout 600000 rss.server.app.expired.withoutHeartbeat 120000 rss.server.flush.cold.storage.threshold.size 128m
JVM can occupy the memory although they don't process any data.
But when the memory is full, the shuffle server pod restart,this case my spark application faild
Why do the shuffle server restart? There should be some information in the logs or stdout.
It is restart by k8s, reason is memory is too high.I think if memory release ,it will not be appear
It is restart by k8s, reason is memory is too high.I think if memory release ,it will not be appear
Maybe we should give more memory to the pod.
It is 32G now,I can not give more
You can adjust the parameter of memory in the bin/rss-env.sh
and conf/server.conf.
XMX_SIZE ? it is 30G now.
XMX_SIZE ? it is 30G now.
Could you reduce the value?
I have reduced it to 8G,but the pod restart too
I have reduced it to 8G,but the pod restart too
Does the server restart because of the same reason? You give the pod 32G memory, XMX_SIZE is 8G, don't it?