motan icon indicating copy to clipboard operation
motan copied to clipboard

Error count optimization

Open single-wolf opened this issue 6 years ago • 2 comments

Now behavior

com.weibo.api.motan.transport.netty4.NettyClient

  1. Every successful invoke will resetErrorCount() and set volatile errorCount thats not unnecessary and
if (state.isUnAliveState()) {
    long count = errorCount.longValue();
    if (count < maxClientConnection) {
        // Should not reach here <<---
        state = ChannelState.ALIVE;
    }
}
  1. A little concurrent problem When errorCount=9 and maxClientConnection=10

Thread1 : errorCount.incrementAndGet() -> errorCount(9->10) and lowed down by GC or sync Thread2 : errorCount.incrementAndGet() -> errorCount(11->12) and lowed down by GC or sync Thread3 : errorCount.incrementAndGet() -> errorCount(12->13) and lowed down by GC or sync Thread4 : errorCount.incrementAndGet() -> errorCount(13->14) and lowed down by GC or sync Thread4 : errorCount.set(0) -> errorCount(14->0)

Then Nothing happened or wait for another maxClientConnection

Optimization

1. use get() combine accumulateAndGet. Set state=ChannelState.ALIVE after reconnect successfully and not here.

private LongBinaryOperator resetErrorCntOp = (prev, zero) -> prev < maxClientConnection ? zero : prev;
void resetErrorCount() {
    if (errorCount.get() != 0L && state.isAliveState()) {
        errorCount.accumulateAndGet(0L, resetErrorCntOp);
    }
}

2. use incrementAndGet() == maxClientConnection to trriger

void incrErrorCount() {
    if (errorCount.incrementAndGet() == maxClientConnection && state.isAliveState()) {
        LoggerUtil.error("NettyClient unavailable Error: url=" + url.getIdentity() + " "
                + url.getServerPortStr());
        state = ChannelState.UNALIVE;
    }
}

That looks clearer and little more efficient.

Motan version

1.1.6

JVM version (e.g. java -version)

java version "1.8.0_131"

single-wolf avatar Oct 14 '19 16:10 single-wolf

@sunnights PTAL

single-wolf avatar Oct 14 '19 16:10 single-wolf

Now behavior

  1. If there are three channels in a client and one channel is unalive but other two channels are still alive because of LB or something else . The client may always be alive and fail to send request.

Optimization

  1. Should we have a map (channel -> channel`s errorCnt) in the client to record every channel in the client. If a channel`s errorCnt exceeded, we try reconnect in a other thread. The alive state of client may based of the cnt or percent of alive channels.

single-wolf avatar Oct 15 '19 02:10 single-wolf