Sentinel icon indicating copy to clipboard operation
Sentinel copied to clipboard

Fix concurrent error in sentinel cluster client

Open XHao opened this issue 2 years ago • 4 comments

Describe what this PR does / why we need it

In TokenClientPromiseHolder.completePromise(int xid, ClusterResponse<T> response), we get the promise from PROMISE_MAP, and the promise may be null. There are two possible situations:

  • The response is timeout and NettyTransportClient remove it through TokenClientPromiseHolder.remove(xid)(OK)
  • The response is received but NettyTransportClient haven't invoke putPromise(xid, promise)(concurrent error, mostly GC hanppens)

This PR will fix it to avoid client getting timeout exception in promise.await

Does this pull request fix one issue?

Fixes #2852

Describe how you did it

  1. TokenPromise use SynchronousQueue to avoid concurrent error (like chan in golang)
  2. TokenPromise use startTime and poll to avoid client blocking too long
  3. TokenClientPromiseHolder use ThreadPool to avoid TokenClientHandler blocking too long in channelRead

Describe how to verify it

In TokenPromiseTest, there are three unit test cases

  • normalTest : rt is 3ms
  • FastConsumerTest : rt is 0ms, response is received before put promise
  • timeoutTest : rt is 20ms

XHao avatar Sep 01 '22 03:09 XHao

CLA assistant check
All committers have signed the CLA.

CLAassistant avatar Sep 01 '22 03:09 CLAassistant

Could you please sign the CLA: https://cla-assistant.io/alibaba/Sentinel?pullRequest=2861

sczyh30 avatar Sep 02 '22 07:09 sczyh30

Could you please illustrate your design in PR description?

sczyh30 avatar Sep 05 '22 03:09 sczyh30

Could you please illustrate your design in PR description?

done

XHao avatar Sep 05 '22 14:09 XHao

Hi @XHao , if the concurrent issue is caused by the channel.writeAndFlush(request); before the TokenClientPromiseHolder.putPromise(xid, promise);. why we can't solve it by move the TokenClientPromiseHolder.putPromise(xid, promise); ahead?

brotherlu-xcq avatar Nov 18 '22 09:11 brotherlu-xcq

Hi @XHao , if the concurrent issue is caused by the channel.writeAndFlush(request); before the TokenClientPromiseHolder.putPromise(xid, promise);. why we can't solve it by move the TokenClientPromiseHolder.putPromise(xid, promise); ahead?

Sorry, I made a stupid mistake. 😅 channel.writeAndFlush() has no business with channel.newPromise(), so move TokenClientPromiseHolder.put(xid, channel.newPromise()); ahead can solve this problem. This PR is too complicated, and I will close it.😓

XHao avatar Nov 18 '22 12:11 XHao