dubbo icon indicating copy to clipboard operation
dubbo copied to clipboard

org.apache.dubbo.remoting.TimeoutException: Waiting server-side response timeout by scan timer. start time: 2021-08-15 23:04:35.615, end time: 2021-08-15 23:05:06.234, client elapsed: 20491 ms, server elapsed: 10128 ms, timeout: 100 ms

Open LiWenGu opened this issue 3 years ago • 28 comments

  • [x] I have searched the issues of this repository and believe that this is not a duplicate.
  • [x] I have checked the FAQ of this repository and believe that this is not a duplicate.

Environment

  • Dubbo version: 2.7.3
  • Operating System version: Red Hat 4.8.5
  • Java version: openjdk8

Steps to reproduce this issue

It happened online. I tried it locally and couldn't replicate it.

online log:

org.apache.dubbo.remoting.TimeoutException: Waiting server-side response timeout by scan timer. start time: 2021-08-15 23:04:35.615, end time: 2021-08-15 23:05:06.234, client elapsed: 20491 ms, server elapsed: 10128 ms, timeout: 100 ms, request: Request

A timeline based on online consumer and provider logs:

client server
start 1629039875615(2021-08-15 23:04:35.615)
send 1629039896106(2021-08-15 23:04:56.106)
server biz process start(2021-08-15 23:04:56.106)
server biz process end(2021-08-15 23:04:56.109)
received 1629039906234(2021-08-15 23:05:06.234)

Expected Result

Why does the consumer timeout period not take effect

relation issues: #4732 #103

LiWenGu avatar Aug 16 '21 07:08 LiWenGu

First, check network status, the log shows that it took more than 10s for client got response.Second, check application status, it also shows that application had took more than 20s for sending request

dkisser avatar Aug 16 '21 12:08 dkisser

当时我们有很多接口在请求,就这一个接口出现这种情况,第一次碰到这种 client start -> send 需要 20s 的情况,极其夸张,这种。通过 zabbix 查看当前机器的性能还有网络,指标也是正常的。该应用的其它接口请求是正常的。我本地测试也没法复现当时的情况。很奇怪。另外我看很久之前就有人问过这个类似的问题,但是没有解决方案

LiWenGu avatar Aug 18 '21 02:08 LiWenGu

查过线程池的状态了没?

xiaoheng1 avatar Aug 19 '21 11:08 xiaoheng1

查过线程池的状态了没?

我去查看了,发现 005 机器线程数量飙升,这个日志就是 005 机器爆的异常。至于线程池状态没法定位到当时的快照信息。我再看看当时的业务日志有什么异常,这个线程数量突然变多是导致这个的原因?

image

LiWenGu avatar Aug 20 '21 08:08 LiWenGu

我去查看了,发现 005 机器线程数量飙升,这个日志就是 005 机器爆的异常。至于线程池状态没法定位到当时的快照信息。我再看看当时的业务日志有什么异常,这个线程数量突然变多是导致这个的原因?

线程数量飙升可能会引起这个问题。具体确认的话,还是要结合当时的日志,机器状态来确认。

方便贴下你们线程池这块的配置不?

xiaoheng1 avatar Aug 21 '21 02:08 xiaoheng1

dubbo线程池用的 cached image

我看了下 AbortPolicyWithReport.java 类,也打印出当前的线程栈信息了

image

可能需要先分析下这个线程栈信息才能解决当时的问题

LiWenGu avatar Aug 24 '21 02:08 LiWenGu

这是那个时间点的 dubbo_Stack.log 文件,ip做了处理。我使用各种工具分析都说识别不出这个文件为标准的 线程 dump 文件。

Dubbo_JStack.log.2021-08-15_23:03:40.TXT

LiWenGu avatar Aug 24 '21 02:08 LiWenGu

你把 dispatcher 换成 all 试试,看还会不会出现问题

xiaoheng1 avatar Aug 24 '21 06:08 xiaoheng1

2.7.5 昨天也遇到该问题,持续一分钟左右自动恢复,排除了 FullGC, 暂时没找到原因

Caused by: org.apache.dubbo.remoting.TimeoutException: Waiting server-side response timeout by scan timer. start time: 2021-09-12 17:07:02.423, end time: 2021-09-12 17:07:22.432, client elapsed: 19217 ms, server elapsed: 791 ms, timeout: 20000 ms, request: Request [id=9841521, version=2.0.2, twoway=true, event=false, broken=false, data=null], channel: /172.31.168.27:43984 -> /172.31.135.36:20891
	at org.apache.dubbo.remoting.exchange.support.DefaultFuture.doReceived(DefaultFuture.java:210) ~[dubbo-2.7.5.jar:2.7.5]
	at org.apache.dubbo.remoting.exchange.support.DefaultFuture.received(DefaultFuture.java:175) ~[dubbo-2.7.5.jar:2.7.5]
	at org.apache.dubbo.remoting.exchange.support.DefaultFuture$TimeoutCheckTask.notifyTimeout(DefaultFuture.java:297) ~[dubbo-2.7.5.jar:2.7.5]
	at org.apache.dubbo.remoting.exchange.support.DefaultFuture$TimeoutCheckTask.lambda$run$0(DefaultFuture.java:284) ~[dubbo-2.7.5.jar:2.7.5]
	at org.apache.dubbo.common.threadpool.ThreadlessExecutor.waitAndDrain(ThreadlessExecutor.java:77) ~[dubbo-2.7.5.jar:2.7.5]
	at org.apache.dubbo.rpc.AsyncRpcResult.get(AsyncRpcResult.java:175) ~[dubbo-2.7.5.jar:2.7.5]
	at org.apache.dubbo.rpc.protocol.AsyncToSyncInvoker.invoke(AsyncToSyncInvoker.java:61) ~[dubbo-2.7.5.jar:2.7.5]
	at org.apache.dubbo.rpc.protocol.ProtocolFilterWrapper$1.invoke(ProtocolFilterWrapper.java:81) ~[dubbo-2.7.5.jar:2.7.5]
	at org.apache.dubbo.monitor.support.MonitorFilter.invoke(MonitorFilter.java:89) ~[dubbo-2.7.5.jar:2.7.5]
	at org.apache.dubbo.rpc.protocol.ProtocolFilterWrapper$1.invoke(ProtocolFilterWrapper.java:81) ~[dubbo-2.7.5.jar:2.7.5]
	at com.alibaba.csp.sentinel.adapter.dubbo.DubboAppContextFilter.invoke(DubboAppContextFilter.java:43) ~[sentinel-apache-dubbo-adapter-1.8.0.jar:?]
	at org.apache.dubbo.rpc.protocol.ProtocolFilterWrapper$1.invoke(ProtocolFilterWrapper.java:81) ~[dubbo-2.7.5.jar:2.7.5]
	at org.apache.dubbo.rpc.protocol.dubbo.filter.FutureFilter.invoke(FutureFilter.java:49) ~[dubbo-2.7.5.jar:2.7.5]
	at org.apache.dubbo.rpc.protocol.ProtocolFilterWrapper$1.invoke(ProtocolFilterWrapper.java:81) ~[dubbo-2.7.5.jar:2.7.5]
	at org.apache.dubbo.rpc.filter.ConsumerContextFilter.invoke(ConsumerContextFilter.java:55) ~[dubbo-2.7.5.jar:2.7.5]
	at org.apache.dubbo.rpc.protocol.ProtocolFilterWrapper$1.invoke(ProtocolFilterWrapper.java:81) ~[dubbo-2.7.5.jar:2.7.5]
	at org.apache.dubbo.rpc.listener.ListenerInvokerWrapper.invoke(ListenerInvokerWrapper.java:78) ~[dubbo-2.7.5.jar:2.7.5]
	at org.apache.dubbo.rpc.protocol.InvokerWrapper.invoke(InvokerWrapper.java:56) ~[dubbo-2.7.5.jar:2.7.5]
	at org.apache.dubbo.rpc.cluster.support.FailoverClusterInvoker.doInvoke(FailoverClusterInvoker.java:82) ~[dubbo-2.7.5.jar:2.7.5]
	... 51 more

tianshuang avatar Sep 13 '21 01:09 tianshuang

服务提供方没有 dubbo 异常日志

tianshuang avatar Sep 13 '21 01:09 tianshuang

配置信息方便贴下不 provider 端的

xiaoheng1 avatar Sep 14 '21 06:09 xiaoheng1

<dubbo:protocol threadpool="eager" threads="200" queues="1024" name="dubbo" port="20891"/>
<dubbo:provider timeout="60000" retries="0" filter="-exception,traceProviderFilter,methodFilter,exceptionFilter" />

tianshuang avatar Sep 14 '21 08:09 tianshuang

dispatcher 选择的是哪一个

xiaoheng1 avatar Sep 14 '21 09:09 xiaoheng1

默认的,没调过

tianshuang avatar Sep 15 '21 01:09 tianshuang

那你这种情况看不出来是啥原因导致的,我记得 2.7.5 中有一个是关于 dispatcher 的问题,会导致 server 没有发包给 client 导致超时的问题

xiaoheng1 avatar Sep 15 '21 06:09 xiaoheng1

@tianshuang Has this problem been resolved?

CrazyHZM avatar Sep 22 '21 23:09 CrazyHZM

@tianshuang Has this problem been resolved? no

tianshuang avatar Sep 24 '21 01:09 tianshuang

@tianshuang 2.7.14 Does the problem exist?

CrazyHZM avatar Dec 24 '21 07:12 CrazyHZM

@tianshuang 2.7.14 Does the problem exist?

We haven’t upgraded the version, so we don’t know. 😄

tianshuang avatar Dec 24 '21 07:12 tianshuang

@tianshuang It is recommended to upgrade to 2.7.14 and observe whether the problem can be reproduced. 2.7.5 is not recommended for production and large-scale use

CrazyHZM avatar Dec 24 '21 07:12 CrazyHZM

@tianshuang It is recommended to upgrade to 2.7.14 and observe whether the problem can be reproduced. 2.7.5 is not recommended for production and large-scale use

We are mainly worried about compatibility issues, because production services need to remain stable.

tianshuang avatar Dec 24 '21 07:12 tianshuang

  • [x] I have searched the issues of this repository and believe that this is not a duplicate.
  • [x] I have checked the FAQ of this repository and believe that this is not a duplicate.

Environment

  • Dubbo version: 2.7.3
  • Operating System version: Red Hat 4.8.5
  • Java version: openjdk8

Steps to reproduce this issue

It happened online. I tried it locally and couldn't replicate it.

online log:

org.apache.dubbo.remoting.TimeoutException: Waiting server-side response timeout by scan timer. start time: 2021-08-15 23:04:35.615, end time: 2021-08-15 23:05:06.234, client elapsed: 20491 ms, server elapsed: 10128 ms, timeout: 100 ms, request: Request

A timeline based on online consumer and provider logs:

client server start 1629039875615(2021-08-15 23:04:35.615) send 1629039896106(2021-08-15 23:04:56.106) server biz process start(2021-08-15 23:04:56.106) server biz process end(2021-08-15 23:04:56.109) received 1629039906234(2021-08-15 23:05:06.234)

Expected Result

Why does the consumer timeout period not take effect

relation issues: #4732 #103

问题解决了么,我是dubbo2.7.3,也会出这样的问题

helei810304 avatar May 06 '22 09:05 helei810304

+1昨天我们也出现了,服务提供方没有异常,然后版本是2.7.6

totalo avatar Jun 03 '22 02:06 totalo

dubbo 3.1.4 nacos 测试用例报错 找不到任何原因 哪位那位大哥帮帮忙 image

90706827 avatar Dec 26 '22 17:12 90706827

dubbo 3.1.4 nacos 测试用例报错 找不到任何原因 哪位那位大哥帮帮忙 image

单机的检查是否有 socks5 的代理

AlbumenJ avatar Dec 30 '22 08:12 AlbumenJ

Caused by: org.apache.dubbo.remoting.TimeoutException: Waiting server-side response timeout by scan timer. start time: 2023-05-22 20:54:49.553, end time: 2023-05-22 20:54:50.572, client elapsed: 1 ms, server elapsed: 1018 ms, timeout: 1000 ms, at org.apache.dubbo.remoting.exchange.support.DefaultFuture.doReceived(DefaultFuture.java:205) at org.apache.dubbo.remoting.exchange.support.DefaultFuture.received(DefaultFuture.java:170) at org.apache.dubbo.remoting.exchange.support.DefaultFuture$TimeoutCheckTask.notifyTimeout(DefaultFuture.java:288) at org.apache.dubbo.remoting.exchange.support.DefaultFuture$TimeoutCheckTask.lambda$run$0(DefaultFuture.java:275) at org.apache.dubbo.common.threadpool.ThreadlessExecutor$RunnableWrapper.run(ThreadlessExecutor.java:196) at org.apache.dubbo.common.threadpool.ThreadlessExecutor.waitAndDrain(ThreadlessExecutor.java:99) at org.apache.dubbo.rpc.AsyncRpcResult.get(AsyncRpcResult.java:179) ... 49 common frames omitted

Dubbo Version 2.7.15 exist too

worker24h avatar May 22 '23 15:05 worker24h

解决了吗?2024-05-22 09:14:58;dubbo版本2.7.15也遇到了这个问题

dongjiancom avatar May 22 '24 01:05 dongjiancom

12个pod只有1个或2个出现这个问题,dubbo 线程池设置为3000,有问题的pod都满了,其他机器正常。

dongjiancom avatar May 22 '24 01:05 dongjiancom