[#2111] improvement(client): cleanup the useless shuffle server clients when unregister shuffle
What changes were proposed in this pull request?
Cleanup useless the shuffle server clients when unregister shuffle
Why are the changes needed?
Fix: #2111
Does this PR introduce any user-facing change?
No.
How was this patch tested?
UT
Test Results
3 034 files + 152 3 034 suites +152 6h 43m 35s ⏱️ + 46m 53s 1 179 tests + 160 1 178 ✅ + 160 1 💤 ±0 0 ❌ ±0 14 937 runs +2 069 14 922 ✅ +2 069 15 💤 ±0 0 ❌ ±0
Results for commit 8729ecdb. ± Comparison against base commit d170004a.
This pull request removes 30 and adds 190 tests. Note that renamed tests count towards both.
org.apache.uniffle.common.records.RecordsReaderWriterTest ‑ testWriteAndReadRecordFile1{String, File}[1]
org.apache.uniffle.common.records.RecordsReaderWriterTest ‑ testWriteAndReadRecordFile1{String, File}[2]
org.apache.uniffle.common.records.RecordsReaderWriterTest ‑ testWriteAndReadRecordFile2{String, File}[1]
org.apache.uniffle.common.records.RecordsReaderWriterTest ‑ testWriteAndReadRecordFile2{String, File}[2]
org.apache.uniffle.common.records.RecordsReaderWriterTest ‑ testWriteAndReadRecordFile3{String, File}[1]
org.apache.uniffle.common.records.RecordsReaderWriterTest ‑ testWriteAndReadRecordFile3{String, File}[2]
org.apache.uniffle.common.records.RecordsReaderWriterTest ‑ testWriteAndReadRecordFile4{String, File}[1]
org.apache.uniffle.common.records.RecordsReaderWriterTest ‑ testWriteAndReadRecordFile4{String, File}[2]
org.apache.uniffle.common.serializer.PartialInputStreamTest ‑ testReadFileInputStream
org.apache.uniffle.common.serializer.PartialInputStreamTest ‑ testReadMemroyInputStream
…
org.apache.hadoop.mapred.SortWriteBufferManagerTest ‑ testWriteNormalWithRemoteMerge
org.apache.hadoop.mapred.SortWriteBufferManagerTest ‑ testWriteNormalWithRemoteMergeAndCombine
org.apache.hadoop.mapreduce.task.reduce.RMRssShuffleTest ‑ testReadShuffleWithCombine
org.apache.hadoop.mapreduce.task.reduce.RMRssShuffleTest ‑ testReadShuffleWithoutCombine
org.apache.spark.shuffle.DelegationRssShuffleManagerTest ‑ testDefaultIncludeExcludeProperties
org.apache.spark.shuffle.DelegationRssShuffleManagerTest ‑ testExcludeProperties
org.apache.spark.shuffle.DelegationRssShuffleManagerTest ‑ testIncludeProperties
org.apache.spark.shuffle.handle.MutableShuffleHandleInfoTest ‑ testUpdateAssignmentOnPartitionSplit
org.apache.tez.runtime.library.common.shuffle.orderedgrouped.RMRssShuffleTest ‑ testReadMultiPartitionShuffleData
org.apache.tez.runtime.library.common.shuffle.orderedgrouped.RMRssShuffleTest ‑ testReadShuffleData
…
:recycle: This comment has been updated with latest results.
What will it happen if we don't have this patch?
What will it happen if we don't have this patch?
If there are many stages in a application, the client will keep the clients for all shuffle servers, which will waste memory.
What will it happen if we don't have this patch?
If there are many stages in a application, the client will keep the clients for all shuffle servers, which will waste memory.
One server may be used by multiple stages. We can't remove it if only one stage is finished.
What will it happen if we don't have this patch?
If there are many stages in a application, the client will keep the clients for all shuffle servers, which will waste memory.
One server may be used by multiple stages. We can't remove it if only one stage is finished.
Right, not all the shuffle servers will be deleted in this PR
I hope the closing way should be optional,
+1
which can be implemented by the pluggable closable policies.
What are the possible policies?
@zuston gently ping.
@zuston gently ping.
Sorry the late review and I think i missed this thread. @xianjingfeng
How about get the client through a guava cache with expire time setting, maybe it's more common without caring about whether the corresponding server is useless or availeble
This solution will have problems if the client is only retrieved from the cache once when it is used in the future.
I think I should close this PR. Because I find ShuffleServerClientFactory#getShuffleServerClient needs to be optimized more than it. #2441