incubator-uniffle
incubator-uniffle copied to clipboard
[Improvement] Limit the speed of memory release when drop pending events
In org.apache.uniffle.server.ShuffleFlushManager#processPendingEvents
,OOM will happen if a large number of events need to be dropped, because usedMemory
release immediately, but the speed of GC is not fast enough.
I'm curious about how to find OOM is caused by dropped events.
https://stackoverflow.com/questions/8719071/is-it-possible-to-get-outofmemoryerror-because-garbage-collection-too-slow I'm not sure the effect of pr.
The purpose of this PR is to reserve more time for GC and reduce the speed of receiving data. Maybe this is not a good way, but I feel it is effective.
From my understanding, if memory is not enough, it will firstly trigger GC to remove unused objects. This will stop the world.
After that, when memory is still not enough, it will OOM.
What do u think so?
When to trigger GC? The direct reason for GC is that it is written too fast. Maybe it will be solved after replace grpc with netty in the future.
@zuston I misunderstood what you said just now. You are right, but GC will not delete all unused objects at once.
Could you explain the two diagrams? I can't get your point.
When total_dropped_event_num
rises rapidly, jvm_memory_bytes_used
rises rapidly at the same time. Does it means that it is triggered by processing pending events?
Em......If we can't write data to storage, we will occupy the more memory. It's normal. But the two diagrams can't persuade me that the gc speed lead to OOM.