Dubbo 性能调优总结文档

Open icodening opened this issue 3 years ago • 0 comments

Dubbo 性能调优总结文档

1.结论

先说本次dubbo调优结论，从以下benchmark的数据可得知，小报文场景下性能提高了约37%，较大报文的场景下性能提高了约44%。

(协议:dubbo, 序列化:fastjson2, 测试时间:2022.11.09)
3.2 最新分支
Benchmark           Mode  Cnt   Score    Error   Units
Client.createUser  thrpt    3  82.743 ± 13.208  ops/ms
Client.existUser   thrpt    3  92.768 ± 20.261  ops/ms
Client.getUser     thrpt    3  82.923 ± 24.910  ops/ms
Client.listUser    thrpt    3  50.190 ± 28.928  ops/ms

3.1.2
Benchmark           Mode  Cnt   Score    Error   Units
Client.createUser  thrpt    3  60.241 ± 19.312  ops/ms
Client.existUser   thrpt    3  67.606 ± 39.453  ops/ms
Client.getUser     thrpt    3  62.503 ± 38.274  ops/ms
Client.listUser    thrpt    3  34.822 ± 18.078  ops/ms

benchmark仓库: https://github.com/apache/dubbo-benchmark

2.核心改动点

批量发送请求 https://github.com/apache/dubbo/pull/10728
用户线程encode https://github.com/apache/dubbo/pull/10854

3.原理介绍

3.1 批量发送

通过了解Netty4的原理我们可以得知，当调用channel的writeAndFlush方法时，Netty会判断当前发送请求的线程是否是当前channel所绑定的EventLoop线程，如果不是EventLooop则会构造一个写任务WriteTask并将其提交到EventLoop中稍后执行,其代码(io.netty.channel.AbstractChannelHandlerContext)如下：

private void write(Object msg, boolean flush, ChannelPromise promise) {
        ObjectUtil.checkNotNull(msg, "msg");
        try {
            if (isNotValidPromise(promise, true)) {
                ReferenceCountUtil.release(msg);
                // cancelled
                return;
            }
        } catch (RuntimeException e) {
            ReferenceCountUtil.release(msg);
            throw e;
        }

        final AbstractChannelHandlerContext next = findContextOutbound(flush ?
                (MASK_WRITE | MASK_FLUSH) : MASK_WRITE);
        final Object m = pipeline.touch(msg, next);
        EventExecutor executor = next.executor();
  	//判断当前线程是否是该channel绑定的EventLoop
        if (executor.inEventLoop()) {
            if (flush) {
                next.invokeWriteAndFlush(m, promise);
            } else {
                next.invokeWrite(m, promise);
            }
        } else {
            final WriteTask task = WriteTask.newInstance(next, m, promise, flush);
            //将写任务提交到EventLoop上稍后执行
            if (!safeExecute(executor, task, promise, m, !flush)) {
                // We failed to submit the WriteTask. We need to cancel it so we decrement the pending bytes
                // and put it back in the Recycler for re-use later.
                //
                // See https://github.com/netty/netty/issues/8343.
                task.cancel();
            }
        }
    }

从这部分源码我们可以了解到Netty写消息时总是会保证把任务提交到EventLoop线程上处理，而每调度一次EventLoop线程去执行写任务WriteTask却只能写一个消息。其线程模型如下图：

而改造后的做法是将所有的消息都先提交到一个WriteQueue消息写队列上，内部会获取一次EventLoop并提交一个任务，该任务的逻辑比较简单，那就是从消息队列上不断的取消息出来并调用Netty的write。其核心源码如下：

@Override
protected void flush(MessageTuple item) {
  prepare(item);
  Object finalMessage = multiMessage;
  if (multiMessage.size() == 1) {
    finalMessage = multiMessage.get(0);
  }
  channel.writeAndFlush(finalMessage).addListener(new ChannelFutureListener() {
    @Override
    public void operationComplete(ChannelFuture future) throws Exception {
      ChannelPromise cp;
      while ((cp = promises.poll()) != null) {
        if (future.isSuccess()){
          cp.setSuccess();
        } else {
          cp.setFailure(future.cause());
        }
      }
    }
  });
  this.multiMessage.removeMessages();
}

执行该flush的逻辑时，是处于EventLoop线程的，而从前面的Netty源码我们知道，当写动作处于EventLoop线程中时是会立即执行写动作的，并不会出现线程切换的行为！那么相较于之前每次都直接在用户线程中调用writeAndFlush而言，大幅度的减少了用户线程与EventLoop线程的切换次数，也使得一次WriteTask写出的消息数量有了大幅度提高，达到批量发包的效果，以此提高dubbo协议在小报文场景下的性能。改造后的模型如下图：

3.2 用户线程encode

优化前的encode是在Netty的EventLoop上做的，而我们知道在Netty中对于同一个Channel，使用的EventLoop线程也是同一个，即Channel是与一个EventLoop线程绑定的，所有的编解码操作都会在这个绑定的EventLoop线程上执行。根据该原理，我们也可以猜到了当报文较大时，在EventLoop上执行encode的耗时也会比较大，而由于同一个Channel只能由一个已绑定的EventLoop线程做编解码操作，那就意味着多个消息并发发送时其实会在EventLoop上串行做encode。以下是jvisualvm的耗时采样图( 源于3.1.2 benchmark的listUser)

可以看到名为NettyServerWorker-3-2的线程上的FastJson2ObjectOutput.writeObject方法耗时占比很高。而根据简单的分析得知报文越大，单次encode上的耗时也就越大，那么在EventLoop上串行处理encode的耗时也就越大，使得需要发送的消息总是被堵在后面排队等待处理，极大的制约了dubbo在较大报文场景下的表现。而这类encode行为是可以在用户线程上并行执行的，在用户线程上并行编码后，再将编码后的结果交付到EventLoop线程上处理，可以大幅度减少EventLoop串行encode的带来的损耗，以此提高dubbo协议在较大报文场景下的性能！其关键代码如下

public void send(Object message, boolean sent) throws RemotingException {
    ......省略
    try {
        Object outputMessage = message;
        //是否打开了在IO线程encode，默认在用户线程上encode
        if (!encodeInIOThread) {
            //执行当前逻辑的线程是用户线程，在用户线程上调用codec.encode进行编码，得到交付EventLoop线程的消息buf
            ByteBuf buf = channel.alloc().buffer();
            ChannelBuffer buffer = new NettyBackedChannelBuffer(buf);
            codec.encode(this, buffer, message);
            outputMessage = buf;
        }
        //将写出的消息提交到WriteQueue中
        ChannelFuture future = writeQueue.enqueue(outputMessage).addListener(new ChannelFutureListener() {
            @Override
            public void operationComplete(ChannelFuture future) throws Exception {
                ......写出消息后置处理
            }
        });
   ......省略
}

Nov 10 '22 17:11 icodening