GatewayWorker
GatewayWorker copied to clipboard
EventBase::loop(): Failed to invoke event callback, breaking the loop.
businessWorker运行一段时间之后产生以下报错信息,使用K8S部署,启动3个pod,只有1个pod出现这问题,不会同时出现;
------------------------------------------- WORKERMAN --------------------------------------------
Workerman version:4.1.11 PHP version:8.1.16 Event-Loop:\Workerman\Events\Event
-------------------------------------------- WORKERS ---------------------------------------------
proto user worker listen processes status
tcp root BusinessWorker none 12 [OK]
--------------------------------------------------------------------------------------------------
Press Ctrl+C to stop. Start success.
GatewayConnection Error : 2 ,client closed
Exception: connection close tcp://10.2.164.000:2301 in /home/htdocs/im/vendor/workerman/gateway-worker/src/Lib/Gateway.php:1182
Stack trace:
#0 /home/htdocs/im/vendor/workerman/gateway-worker/src/Lib/Gateway.php(1111): GatewayWorker\Lib\Gateway::sendAndRecv()
#1 /home/htdocs/im/vendor/workerman/gateway-worker/src/BusinessWorker.php(362): GatewayWorker\Lib\Gateway::getSession()
#2 /home/htdocs/im/vendor/workerman/workerman/Connection/TcpConnection.php(646): GatewayWorker\BusinessWorker->onGatewayMessage()
#3 [internal function]: Workerman\Connection\TcpConnection->baseRead()
#4 /home/htdocs/im/vendor/workerman/workerman/Events/Event.php(193): EventBase->loop()
#5 /home/htdocs/im/vendor/workerman/workerman/Worker.php(1629): Workerman\Events\Event->loop()
#6 /home/htdocs/im/vendor/workerman/workerman/Worker.php(1423): Workerman\Worker::forkOneWorkerForLinux()
#7 /home/htdocs/im/vendor/workerman/workerman/Worker.php(1397): Workerman\Worker::forkWorkersForLinux()
#8 /home/htdocs/im/vendor/workerman/workerman/Worker.php(560): Workerman\Worker::forkWorkers()
#9 /home/htdocs/im/src/Command/ServerCommand.php(129): Workerman\Worker::runAll()
#10 [internal function]: App\Command\ServerCommand->__invoke()
#11 /home/htdocs/im/vendor/minicli/minicli/src/App.php(239): call_user_func()
#12 /home/htdocs/im/vendor/minicli/minicli/src/App.php(218): Minicli\App->runSingle()
#13 /home/htdocs/im/app(27): Minicli\App->runCommand()
#14 {main}
PHP Warning: EventBase::loop(): Failed to invoke event callback, breaking the loop. in /home/htdocs/im/vendor/workerman/workerman/Events/Event.php on line 193
worker[BusinessWorker:14] exit with status 64000
GatewayConnection Error : 2 ,client closed
GatewayConnection Error : 2 ,client closed
GatewayConnection Error : 2 ,client closed
PHP Warning: stream_socket_client(): Unable to connect to tcp://10.2.164.000:2303 (Connection timed out) in /home/htdocs/im/vendor/workerman/gateway-worker/src/Lib/Gateway.php on line 1409
PHP Warning: EventBase::loop(): Failed to invoke event callback, breaking the loop. in /home/htdocs/im/vendor/workerman/workerman/Events/Event.php on line 193
Exception: can not connect to tcp://10.2.164.000:2303 Connection timed out in /home/htdocs/im/vendor/workerman/gateway-worker/src/Lib/Gateway.php:1411
Stack trace:
#0 /home/htdocs/im/vendor/workerman/gateway-worker/src/Lib/Gateway.php(1165): GatewayWorker\Lib\Gateway::getGatewayConnection()
#1 /home/htdocs/im/vendor/workerman/gateway-worker/src/Lib/Gateway.php(1111): GatewayWorker\Lib\Gateway::sendAndRecv()
#2 /home/htdocs/im/vendor/workerman/gateway-worker/src/BusinessWorker.php(362): GatewayWorker\Lib\Gateway::getSession()
#3 /home/htdocs/im/vendor/workerman/workerman/Connection/TcpConnection.php(646): GatewayWorker\BusinessWorker->onGatewayMessage()
#4 [internal function]: Workerman\Connection\TcpConnection->baseRead()
#5 /home/htdocs/im/vendor/workerman/workerman/Events/Event.php(193): EventBase->loop()
#6 /home/htdocs/im/vendor/workerman/workerman/Worker.php(1629): Workerman\Events\Event->loop()
#7 /home/htdocs/im/vendor/workerman/workerman/Worker.php(1423): Workerman\Worker::forkOneWorkerForLinux()
#8 /home/htdocs/im/vendor/workerman/workerman/Worker.php(1397): Workerman\Worker::forkWorkersForLinux()
#9 /home/htdocs/im/vendor/workerman/workerman/Worker.php(560): Workerman\Worker::forkWorkers()
#10 /home/htdocs/im/src/Command/ServerCommand.php(129): Workerman\Worker::runAll()
#11 [internal function]: App\Command\ServerCommand->__invoke()
#12 /home/htdocs/im/vendor/minicli/minicli/src/App.php(239): call_user_func()
#13 /home/htdocs/im/vendor/minicli/minicli/src/App.php(218): Minicli\App->runSingle()
#14 /home/htdocs/im/app(27): Minicli\App->runCommand()
#15 {main}
worker[BusinessWorker:19] exit with status 64000
查询status 有一个进程是N/A busy状态
Workerman[/home/htdocs/im/src/Command/ServerCommand.php] status
----------------------------------------------GLOBAL STATUS----------------------------------------------------
Workerman version:4.1.11 PHP version:8.1.16
start time:2023-09-13 11:09:39 run 0 days 9 hours
load average: 15.7, 17.4, 17.13 event-loop:\Workerman\Events\Event
1 workers 12 processes
worker_name exit_status exit_count
BusinessWorker 64000 3
----------------------------------------------PROCESS STATUS---------------------------------------------------
pid memory listening worker_name connections send_fail timers total_request qps status
9 2.84M none BusinessWorker 13 0 56 1031441 0 [idle]
10 N/A none BusinessWorker N/A N/A N/A N/A N/A [busy]
11 2.81M none BusinessWorker 13 0 41 1017494 0 [idle]
12 2.81M none BusinessWorker 13 0 45 1008825 0 [idle]
13 2.82M none BusinessWorker 13 0 49 1006094 0 [idle]
16 2.84M none BusinessWorker 13 0 50 997295 0 [idle]
17 2.84M none BusinessWorker 13 0 54 1013818 0 [idle]
18 2.84M none BusinessWorker 13 0 58 986283 0 [idle]
21 2.86M none BusinessWorker 13 0 57 1016074 0 [idle]
2444 2.8M none BusinessWorker 13 0 50 1010678 0 [idle]
2989 2.75M none BusinessWorker 13 0 41 1003286 0 [idle]
6740 2.79M none BusinessWorker 13 0 56 36494 0 [idle]
----------------------------------------------PROCESS STATUS---------------------------------------------------
Summary 22M - - 143 0 557 10127782 0 [Summary]
请问作者大佬,此问题应该如何排查定位问题
10.2.164.000 ip错的
Message ID: @.***>
这是我为提交issue 隐藏机器IP 手动改掉的, 原本IP是对的 已核对过
分布式部署么? 10.2.164.000 这个ip的服务可能有问题,可能是报错了,gatewayWorker进程退出了。还有从你贴的status看服务器负载太高了
- 请问负载是从
load average: 15.7, 17.4, 17.13
得出吗,一般该值超出多少则可认为负载太高 该值在三种进程中的含义是否一致?(参考值是否一致) - 是的 分布式部署
- gatewayWorker进程有以下输出和status状态
frame not masked so close the connection
frame not masked so close the connection
frame not masked so close the connection
frame not masked so close the connection
frame not masked so close the connection
frame not masked so close the connection
frame not masked so close the connection
frame not masked so close the connection
Workerman[/home/htdocs/im/src/Command/ServerCommand.php] status
----------------------------------------------GLOBAL STATUS----------------------------------------------------
Workerman version:4.1.11 PHP version:8.1.16
start time:2023-09-13 11:09:50 run 0 days 10 hours
load average: 22.3, 20.97, 20.85 event-loop:\Workerman\Events\Event
1 workers 4 processes
worker_name exit_status exit_count
Gateway 0 0
----------------------------------------------PROCESS STATUS---------------------------------------------------
pid memory listening worker_name connections send_fail timers total_request qps status
9 7.85M websocket://0.0.0.0:1216 Gateway 1154 4 3 25770080 0 [idle]
10 9.19M websocket://0.0.0.0:1216 Gateway 1377 9 3 27351740 0 [idle]
11 8.04M websocket://0.0.0.0:1216 Gateway 1190 5 3 25933475 0 [idle]
12 8.83M websocket://0.0.0.0:1216 Gateway 1321 7 3 26545133 0 [idle]
----------------------------------------------PROCESS STATUS---------------------------------------------------
Summary 32M - - 5042 25 12 105600428 0 [Summary]
load average: 15.7, 17.4, 17.13 是负载,一般不超过cpu核心数70%
好的,我先尝试扩容降低负载再观察是否还有loop的问题 谢谢大佬
你们是压测么? gatewayWorker内部接口调用(例如Gatway::sendToAll())一般会与所有gateway进程通讯一次,所以整个集群的gateway进程数越少整个集群效率越高,负载越低。如果系统是因为内部频繁Gateway接口调用导致的负载高,增加gateway服务器并不能减少负载,反而会让负载更高。
如果你们有非常频繁的Gateway接口调用,gateway服务器建议只开两台服务器,每台只开2个进程,可以降低整个集群负载。
不是压测,是正式环境的请求量
目前gatewaWorker的进程数量为:3个节点,每个节点4个进程。
会频繁调用Client::sendToUid()
我们尝试一下降低gatewayWorker进程数量观察一下负载
不是压测,是正式环境的请求量 目前gatewaWorker的进程数量为:3个节点,每个节点4个进程。 会频繁调用
Client::sendToUid()
我们尝试一下降低gatewayWorker进程数量观察一下负载
现在怎么样了。 看你DEBUG面板统计,你的连接数很少,但是每个连接的通讯量很大(短时间大量请求数据包)
frame not masked so close the connection frame not masked so close the connection frame not masked so close the connection frame not masked so close the connection frame not masked so close the connection frame not masked so close the connection frame not masked so close the connection frame not masked so close the connection
masked 这个应该是客户端发送到Gateway网关的websocket 二进制帧不合法,Gateway 认定为非法连接给关闭了。