incubator-uniffle icon indicating copy to clipboard operation
incubator-uniffle copied to clipboard

[#2090] fix(client): Replace LinkedList with CopyOnWriteArrayList to store the sending status

Open xianjingfeng opened this issue 1 year ago • 3 comments

What changes were proposed in this pull request?

Replace LinkedList with CopyOnWriteArrayList to store the sending status

Why are the changes needed?

Fix: #2090

Does this PR introduce any user-facing change?

No.

How was this patch tested?

CI

xianjingfeng avatar Aug 30 '24 08:08 xianjingfeng

CopyOnWriteArrayList is suitable case that read is more, write is less.

jerqi avatar Aug 30 '24 08:08 jerqi

CopyOnWriteArrayList is suitable case that read is more, write is less.

I know, I think this scene is suitable.

xianjingfeng avatar Aug 30 '24 08:08 xianjingfeng

Test Results

 2 807 files  +25   2 807 suites  +25   5h 52m 10s :stopwatch: + 4m 0s    992 tests + 3     991 :white_check_mark: + 5   1 :zzz: ±0  0 :x:  - 1  12 463 runs  +55  12 448 :white_check_mark: +57  15 :zzz: ±0  0 :x:  - 1 

Results for commit 93e53b97. ± Comparison against base commit 0f9d9bcb.

:recycle: This comment has been updated with latest results.

github-actions[bot] avatar Aug 30 '24 08:08 github-actions[bot]

@xianjingfeng Do you think Collections.synchronizedList could be better?

As ShuffleWriteClientImpl#recordFailedBlocks add all the failed blocks to the list, if we replace to CopyOnWriteArrayList, the recordFailedBlocks performance will get worse as each add will cause new list & copy.

  void recordFailedBlocks(
      FailedBlockSendTracker blockIdsSendFailTracker,
      Map<ShuffleServerInfo, Map<Integer, Map<Integer, List<ShuffleBlockInfo>>>> serverToBlocks,
      ShuffleServerInfo shuffleServerInfo,
      StatusCode statusCode) {
    serverToBlocks.getOrDefault(shuffleServerInfo, Collections.emptyMap()).values().stream()
        .flatMap(innerMap -> innerMap.values().stream())
        .flatMap(List::stream)
        .forEach(block -> blockIdsSendFailTracker.add(block, shuffleServerInfo, statusCode));
  }

I'm not sure this is a read is more, write is less scenario. @zuston If I support Partition split leverage blockIdsSendFailTracker, do you think the problem getting worse that now?

maobaolong avatar Sep 02 '24 02:09 maobaolong

Why not using the synchronized to ensure the safety of race condition ? @xianjingfeng I also concern the performance degression when using the CopyOnWrite

zuston avatar Sep 02 '24 05:09 zuston

Do you think Collections.synchronizedList could be better?

It not thread-safe when iterating over it. @maobaolong

Why not using the synchronized to ensure the safety of race condition ?

For the job that finally runs successfully, there is at most one element in the list, so I think this performance impact is acceptable. If we use synchronized, we need to pay attention to thread-safe issues when maintaining this part of the code later. synchronized is ok for me too.

xianjingfeng avatar Sep 02 '24 06:09 xianjingfeng

https://github.com/xianjingfeng/incubator-uniffle/commit/35d0cb485c35dbb3bd6b798e2a04091062be3996 This is a solution with synchronized, which I don't think is a good solution. If you think it's OK, I will use this solution. @jerqi @zuston @maobaolong

xianjingfeng avatar Sep 02 '24 06:09 xianjingfeng

It not thread-safe when iterating over it.

@xianjingfeng Thanks to share this.

And the following trick can resolve the iterator safe issue.

List<Object> list = Collections.synchronizedList(new ArrayList<>());
synchronized (list) {
    Iterator<Object> iterator = list.iterator();
    while (iterator.hasNext()) {
        // ...
    }
}

For the job that finally runs successfully, there is at most one element in the list, so I think this performance impact is acceptable

I'm just implementing a partition-split feature leverage on blockIdsSendFailTracker, which can invoke the add method frequently than before.

If the existing state add is invoked in low frequent, I'm ok for this PR and I will resolve the performance down issue in that PR.

maobaolong avatar Sep 02 '24 08:09 maobaolong

Changed to use Collections.synchronizedList @maobaolong @jerqi @zuston

xianjingfeng avatar Sep 02 '24 10:09 xianjingfeng