[#2090] fix(client): Replace LinkedList with CopyOnWriteArrayList to store the sending status
What changes were proposed in this pull request?
Replace LinkedList with CopyOnWriteArrayList to store the sending status
Why are the changes needed?
Fix: #2090
Does this PR introduce any user-facing change?
No.
How was this patch tested?
CI
CopyOnWriteArrayList is suitable case that read is more, write is less.
CopyOnWriteArrayList is suitable case that read is more, write is less.
I know, I think this scene is suitable.
Test Results
2 807 files +25 2 807 suites +25 5h 52m 10s :stopwatch: + 4m 0s 992 tests + 3 991 :white_check_mark: + 5 1 :zzz: ±0 0 :x: - 1 12 463 runs +55 12 448 :white_check_mark: +57 15 :zzz: ±0 0 :x: - 1
Results for commit 93e53b97. ± Comparison against base commit 0f9d9bcb.
:recycle: This comment has been updated with latest results.
@xianjingfeng Do you think Collections.synchronizedList could be better?
As ShuffleWriteClientImpl#recordFailedBlocks add all the failed blocks to the list, if we replace to CopyOnWriteArrayList, the recordFailedBlocks performance will get worse as each add will cause new list & copy.
void recordFailedBlocks(
FailedBlockSendTracker blockIdsSendFailTracker,
Map<ShuffleServerInfo, Map<Integer, Map<Integer, List<ShuffleBlockInfo>>>> serverToBlocks,
ShuffleServerInfo shuffleServerInfo,
StatusCode statusCode) {
serverToBlocks.getOrDefault(shuffleServerInfo, Collections.emptyMap()).values().stream()
.flatMap(innerMap -> innerMap.values().stream())
.flatMap(List::stream)
.forEach(block -> blockIdsSendFailTracker.add(block, shuffleServerInfo, statusCode));
}
I'm not sure this is a read is more, write is less scenario. @zuston If I support Partition split leverage blockIdsSendFailTracker, do you think the problem getting worse that now?
Why not using the synchronized to ensure the safety of race condition ? @xianjingfeng I also concern the performance degression when using the CopyOnWrite
Do you think
Collections.synchronizedListcould be better?
It not thread-safe when iterating over it. @maobaolong
Why not using the
synchronizedto ensure the safety of race condition ?
For the job that finally runs successfully, there is at most one element in the list, so I think this performance impact is acceptable. If we use synchronized, we need to pay attention to thread-safe issues when maintaining this part of the code later. synchronized is ok for me too.
https://github.com/xianjingfeng/incubator-uniffle/commit/35d0cb485c35dbb3bd6b798e2a04091062be3996 This is a solution with synchronized, which I don't think is a good solution. If you think it's OK, I will use this solution. @jerqi @zuston @maobaolong
It not thread-safe when iterating over it.
@xianjingfeng Thanks to share this.
And the following trick can resolve the iterator safe issue.
List<Object> list = Collections.synchronizedList(new ArrayList<>());
synchronized (list) {
Iterator<Object> iterator = list.iterator();
while (iterator.hasNext()) {
// ...
}
}
For the job that finally runs successfully, there is at most one element in the list, so I think this performance impact is acceptable
I'm just implementing a partition-split feature leverage on blockIdsSendFailTracker, which can invoke the add method frequently than before.
If the existing state add is invoked in low frequent, I'm ok for this PR and I will resolve the performance down issue in that PR.
Changed to use Collections.synchronizedList @maobaolong @jerqi @zuston