incubator-uniffle [Problem] The shuffle server memory not release

uniffle version: 0.6.0, I deploy on k8s,but the shuffle server memory not release ,even my spark application is stop it does not release too my server config is below:

rss.rpc.server.port 19999 rss.jetty.http.port 19998 rss.rpc.executor.size 2000 rss.storage.type MEMORY_LOCALFILE_HDFS rss.coordinator.quorum 10.100.41.162:19999 rss.server.disk.capacity 50g rss.storage.basePath /home/data rss.server.flush.thread.alive 1 rss.server.flush.threadPool.size 10 rss.server.buffer.capacity 4g rss.server.read.buffer.capacity 2g rss.server.heartbeat.timeout 60000 rss.server.heartbeat.interval 10000 rss.rpc.message.max.size 1073741824 rss.server.preAllocation.expired 120000 rss.server.commit.timeout 600000 rss.server.app.expired.withoutHeartbeat 120000 rss.server.flush.cold.storage.threshold.size 128m

Sep 20 '22 09:09 wfxxh

JVM can occupy the memory although they don't process any data.

Sep 21 '22 02:09 jerqi

But when the memory is full, the shuffle server pod restart,this case my spark application faild

Sep 21 '22 02:09 wfxxh

Why do the shuffle server restart? There should be some information in the logs or stdout.

Sep 21 '22 02:09 jerqi

It is restart by k8s, reason is memory is too high.I think if memory release ,it will not be appear

Sep 21 '22 02:09 wfxxh

It is restart by k8s, reason is memory is too high.I think if memory release ,it will not be appear

Maybe we should give more memory to the pod.

Sep 21 '22 03:09 jerqi

It is 32G now,I can not give more

Sep 21 '22 05:09 wfxxh

You can adjust the parameter of memory in the bin/rss-env.sh and conf/server.conf.

Sep 21 '22 06:09 jerqi

XMX_SIZE ? it is 30G now.

Sep 21 '22 06:09 wfxxh

XMX_SIZE ? it is 30G now.

Could you reduce the value?

Sep 21 '22 07:09 jerqi

I have reduced it to 8G，but the pod restart too

Sep 21 '22 07:09 wfxxh

I have reduced it to 8G，but the pod restart too

Does the server restart because of the same reason? You give the pod 32G memory, XMX_SIZE is 8G, don't it?

Sep 21 '22 09:09 jerqi

incubator-uniffle incubator-uniffle copied to clipboard

[Problem] The shuffle server memory not release

incubator-uniffle
incubator-uniffle copied to clipboard