scql icon indicating copy to clipboard operation
scql copied to clipboard

scql执行run命令 timeout参数设置不起作用

Open friendsAI opened this issue 1 year ago • 17 comments

Issue Type

Running

Have you searched for existing issues?

Yes

OS Platform and Distribution

linux v10

SCQL Version

0.8.1b

What happend and What you expected to happen.

执行run 命令,设置 --timeout 参数不起作用。设置为500秒,很快就报超时,任务没有执行成功。请问,--timeout 参数,是不是有个默认的上限值,为什么没有按照用户实际设置的来?

Configuration used to run SCQL.

报超时错误。

SCQL log output.

报超时错误。

friendsAI avatar Sep 26 '24 13:09 friendsAI

你好,可以贴一下执行命令和日志的信息

BrainWH avatar Sep 27 '24 02:09 BrainWH

命令如下: ./brokerctl run "select cloud2.ID,cloud2.name,cloud2.cardno from cloud2 inner join ga on ga.ID=cloud2.ID;" --project-id "9ff175c9ca1f49bca5f363a88f4ffbd7d" --host "http://192.168.90.171:8080" --timeout 500 broker 部分日志如下: 2024-09-27 02:54:49.9272 ERROR executor.go:98 |RequestID:|SessionID:bad8437a-7c7b-11ef-a9b6-0242ac170002|ActionName:EngineStub@RunExecutionPlan|CostTime:54.583204491s|Reason:InvalidResponse|ErrorMsg:Error: code=320, msg="RunExecutionPlan create session(bad8437a-7c7b-11ef-a9b6-0242ac170002) failed, catch std::exception=[external/yacl/yacl/link/transport/channel.cc:427] Get data timeout, key=bad8437a-7c7b-11ef-a9b6-0242ac170002:2:ALLGATHER "|Request: 2024-09-27 02:54:49.9272 ERROR common.go:139 |RequestID:|RequestParty:|SessionID:|ActionName:Intra@DoQuery|CostTime:56.623403125s|Reason:|ErrorMsg:runQuery Execute err: Error: code=320, msg="RunExecutionPlan create session(bad8437a-7c7b-11ef-a9b6-0242ac170002) failed, catch std::exception=[external/yacl/yacl/link/transport/channel.cc:427] Get data timeout, key=bad8437a-7c7b-11ef-a9b6-0242ac170002:2:ALLGATHER "|Request:project_id:"9ff175c9ca1f49bca5f363a88f4ffbd7d" query:"select cloud2.ID,cloud2.name,cloud2.cardno from cloud2 inner join ga on ga.ID=cloud2.ID;" debug_opts:{} job_config:{} 2024-09-27 02:54:49.9272 INFO server.go:135 |GIN|status=200|method=POST|path=/intra/query|ip=192.168.90.171|latency=56.623593684s| engine 部分日志如下: 2024-09-27 02:53:07.267 [info] [engine_service_impl.cc:RunPlanSync:571] [job(6fd502f0-7c7b-11ef-a9b6-0242ac170002)] RunExecutionPlan success, sessionID=6fd502f0-7c7b-11ef-a9b6-0242ac170002 2024-09-27 02:53:15.927 [info] [session_manager.cc:RemoveSession:226] [scqlengine] session(6fd502f0-7c7b-11ef-a9b6-0242ac170002) removed, running_cost(86796ms), current running session=0 2024-09-27 02:53:15.928 [warning] [session_manager.cc:GetSession:156] [scqlengine] session(6fd502f0-7c7b-11ef-a9b6-0242ac170002) not exists. default return nullptr. 2024-09-27 02:53:15.928 [warning] [session.cc:ActiveLogger:336] [scqlengine] can not get valid session 2024-09-27 02:53:15.928 [info] [engine_service_impl.cc:StopJob:173] [scqlengine] EngineServiceImpl::StopJob(6fd502f0-7c7b-11ef-a9b6-0242ac170002), reason() 2024-09-27 02:53:15.928 [warning] [session_manager.cc:StopSession:174] [scqlengine] session(6fd502f0-7c7b-11ef-a9b6-0242ac170002) not exists. 2024-09-27 02:53:54.428 [warning] [listener.cc:GetListener:62] [scqlengine] Listener for link_id:bad8437a-7c7b-11ef-a9b6-0242ac170002 not exist. 2024-09-27 02:54:47.525 [warning] [session_manager.cc:GetSession:156] [scqlengine] session(bad8437a-7c7b-11ef-a9b6-0242ac170002) not exists. default return nullptr. 2024-09-27 02:54:47.539 [error] [engine_service_impl.cc:RunExecutionPlan:301] [scqlengine] RunExecutionPlan create session(bad8437a-7c7b-11ef-a9b6-0242ac170002) failed, catch std::exception=[external/yacl/yacl/link/transport/channel.cc:427] Get data timeout, key=bad8437a-7c7b-11ef-a9b6-0242ac170002:2:ALLGATHER 2024-09-27 02:54:47.540 [warning] [session_manager.cc:GetSession:156] [scqlengine] session(bad8437a-7c7b-11ef-a9b6-0242ac170002) not exists. default return nullptr. 2024-09-27 02:54:47.540 [warning] [session.cc:ActiveLogger:336] [scqlengine] can not get valid session 2024-09-27 02:54:49.553 [warning] [engine_service_impl.cc:ReportErrorToPeers:661] [scqlengine] sync error to peer=(1200001725521632516,192.168.90.171:5006) failed: status: code: 141 message: "no session for job_id=bad8437a-7c7b-11ef-a9b6-0242ac170002"

2024-09-27 02:54:49.555 [warning] [session_manager.cc:GetSession:156] [scqlengine] session(bad8437a-7c7b-11ef-a9b6-0242ac170002) not exists. default return nullptr. 2024-09-27 02:54:49.555 [warning] [session.cc:ActiveLogger:336] [scqlengine] can not get valid session 2024-09-27 02:54:49.555 [info] [engine_service_impl.cc:StopJob:173] [scqlengine] EngineServiceImpl::StopJob(bad8437a-7c7b-11ef-a9b6-0242ac170002), reason() 2024-09-27 02:54:49.555 [warning] [session_manager.cc:StopSession:174] [scqlengine] session(bad8437a-7c7b-11ef-a9b6-0242ac170002) not exists.

friendsAI avatar Sep 27 '24 03:09 friendsAI

您的数据量是多大的?方便提供下两台机器的mem,另外两台机器之间的带宽延迟有没有做限制?

lanyy9527 avatar Sep 27 '24 08:09 lanyy9527

两边数据量很小,都不到100条。机器的内存是64G。scql有设置带宽延迟的参数嘛?我没有在其它的地方限制这个。

friendsAI avatar Sep 27 '24 08:09 friendsAI

那有可能是mysql和engine连接的问题,可以贴下你的gflags.conf和docker-compose配置文件信息;

lanyy9527 avatar Sep 27 '24 09:09 lanyy9527

gflags.conf 文件配置如下: --listen_port=8003 --datasource_router=embed --enable_driver_authorization=false --server_enable_ssl=false --driver_enable_ssl_as_client=false --peer_engine_enable_ssl_as_client=false --embed_router_conf={"datasources": [{"id": "ds001", "name": "mysql db", "kind": "MYSQL", "connection_str": "db=bob;user=root;password=123456;host=192.168.90.171;auto-reconnect=true"}], "rules": [{"db": "", "table": "", "datasource_id": "ds001"}]} --enable_self_auth=false --enable_peer_auth=false --peer_engine_protocol=http:proto --peer_engine_connection_type=pooled --spu_allowed_protocols=CHEETAH

docker-compose.yaml文件配置信息如下: version: '3.8' services: broker: image: secretflow-registry.cn-hangzhou.cr.aliyuncs.com/secretflow/scql:latest command: - /home/admin/bin/broker - -config=/home/admin/configs/config.yml restart: always ports: - 8080:8080 - 8081:8081 volumes: - ./config.yml:/home/admin/configs/config.yml - ./party_info.json:/home/admin/configs/party_info.json - ./ed25519key.pem:/home/admin/configs/ed25519key.pem security_opt: - seccomp:unconfined engine: cap_add: - NET_ADMIN command: - /home/admin/bin/scqlengine - --flagfile=/home/admin/engine/conf/gflags.conf image: secretflow-registry.cn-hangzhou.cr.aliyuncs.com/secretflow/scql:latest ports: - 8003:8003 volumes: - ./gflags.conf:/home/admin/engine/conf/gflags.conf security_opt: - seccomp:unconfined

friendsAI avatar Sep 27 '24 09:09 friendsAI

可以尝试下在connection_str中添加mysql port信息,重启服务监控下 docker logs engine-name 日志信息

lanyy9527 avatar Sep 27 '24 10:09 lanyy9527

engine 信息如下: 2024-09-27 10:43:16.686 [info] [main.cc:BuildRouter:108] [scqlengine] Building EmbedRouter from json conf 2024-09-27 10:43:16.687 [info] [session_manager.cc:WatchSessionTimeoutThread:250] [scqlengine] WatchSessionTimeoutThread startup, session default timeout=1800s 2024-09-27 10:43:16.687 [info] [thread_pool.cc:ThreadPool:30] [scqlengine] Create a fixed thread pool with size 32 2024-09-27 10:43:16.688 [info] [main.cc:main:330] [scqlengine] Adding EngineService into brpc server 2024-09-27 10:43:16.689 [info] [main.cc:main:339] [scqlengine] Adding ErrorCollectorService into brpc server 2024-09-27 10:43:16.689 [info] [main.cc:main:348] [scqlengine] Adding MetricsService into brpc server 2024-09-27 10:43:16.689 [info] [main.cc:main:358] [scqlengine] Adding MuxReceiverService into main server... 2024-09-27 10:43:16.692 [warning] [server.cpp:BRPC:1187] [scqlengine] Builtin services are disabled according to ServerOptions.has_builtin_services 2024-09-27 10:43:16.693 [info] [main.cc:main:378] [scqlengine] Started engine rpc server success, listen on: 0.0.0.0:8003

friendsAI avatar Sep 27 '24 10:09 friendsAI

还是执行以前的run命令,仍旧执行不到500秒,报错。 Error: run query: DoQuery response: { "status": { "code": 320, "message": "RunExecutionPlan create session(b0abbc6f-7cbd-11ef-ad09-0242ac180003) failed, catch std::exception=[external/yacl/yacl/link/transport/channel.cc:427] Get data timeout, key=b0abbc6f-7cbd-11ef-ad09-0242ac180003:2:ALLGATHER " } }

engine的日志: 2024-09-27 10:43:16.686 [info] [main.cc:BuildRouter:108] [scqlengine] Building EmbedRouter from json conf 2024-09-27 10:43:16.687 [info] [session_manager.cc:WatchSessionTimeoutThread:250] [scqlengine] WatchSessionTimeoutThread startup, session default timeout=1800s 2024-09-27 10:43:16.687 [info] [thread_pool.cc:ThreadPool:30] [scqlengine] Create a fixed thread pool with size 32 2024-09-27 10:43:16.688 [info] [main.cc:main:330] [scqlengine] Adding EngineService into brpc server 2024-09-27 10:43:16.689 [info] [main.cc:main:339] [scqlengine] Adding ErrorCollectorService into brpc server 2024-09-27 10:43:16.689 [info] [main.cc:main:348] [scqlengine] Adding MetricsService into brpc server 2024-09-27 10:43:16.689 [info] [main.cc:main:358] [scqlengine] Adding MuxReceiverService into main server... 2024-09-27 10:43:16.692 [warning] [server.cpp:BRPC:1187] [scqlengine] Builtin services are disabled according to ServerOptions.has_builtin_services 2024-09-27 10:43:16.693 [info] [main.cc:main:378] [scqlengine] Started engine rpc server success, listen on: 0.0.0.0:8003 2024-09-27 10:46:03.647 [warning] [listener.cc:GetListener:62] [scqlengine] Listener for link_id:b0abbc6f-7cbd-11ef-ad09-0242ac180003 not exist. 2024-09-27 10:46:58.754 [warning] [session_manager.cc:GetSession:156] [scqlengine] session(b0abbc6f-7cbd-11ef-ad09-0242ac180003) not exists. default return nullptr. 2024-09-27 10:46:58.757 [error] [engine_service_impl.cc:RunExecutionPlan:301] [scqlengine] RunExecutionPlan create session(b0abbc6f-7cbd-11ef-ad09-0242ac180003) failed, catch std::exception=[external/yacl/yacl/link/transport/channel.cc:427] Get data timeout, key=b0abbc6f-7cbd-11ef-ad09-0242ac180003:2:ALLGATHER 2024-09-27 10:46:58.758 [warning] [session_manager.cc:GetSession:156] [scqlengine] session(b0abbc6f-7cbd-11ef-ad09-0242ac180003) not exists. default return nullptr. 2024-09-27 10:46:58.758 [warning] [session.cc:ActiveLogger:336] [scqlengine] can not get valid session 2024-09-27 10:47:00.771 [warning] [engine_service_impl.cc:ReportErrorToPeers:661] [scqlengine] sync error to peer=(1200001725521632516,192.168.90.171:5006) failed: status: code: 141 message: "no session for job_id=b0abbc6f-7cbd-11ef-ad09-0242ac180003" broker 日志如下: 2024-09-27 10:47:00.773 [warning] [session_manager.cc:GetSession:156] [scqlengine] session(b0abbc6f-7cbd-11ef-ad09-0242ac180003) not exists. default return nullptr. 2024-09-27 10:47:00.773 [warning] [session.cc:ActiveLogger:336] [scqlengine] can not get valid session 2024-09-27 10:47:00.773 [info] [engine_service_impl.cc:StopJob:173] [scqlengine] EngineServiceImpl::StopJob(b0abbc6f-7cbd-11ef-ad09-0242ac180003), reason() 2024-09-27 10:47:00.773 [warning] [session_manager.cc:StopSession:174] [scqlengine] session(b0abbc6f-7cbd-11ef-ad09-0242ac180003) not exists.

2024-09-27 10:47:00.92710 ERROR executor.go:98 |RequestID:|SessionID:b0abbc6f-7cbd-11ef-ad09-0242ac180003|ActionName:EngineStub@RunExecutionPlan|CostTime:56.08233881s|Reason:InvalidResponse|ErrorMsg:Error: code=320, msg="RunExecutionPlan create session(b0abbc6f-7cbd-11ef-ad09-0242ac180003) failed, catch std::exception=[external/yacl/yacl/link/transport/channel.cc:427] Get data timeout, key=b0abbc6f-7cbd-11ef-ad09-0242ac180003:2:ALLGATHER "|Request: 2024-09-27 10:47:00.92710 ERROR common.go:139 |RequestID:|RequestParty:|SessionID:|ActionName:Intra@DoQuery|CostTime:58.126842499s|Reason:|ErrorMsg:runQuery Execute err: Error: code=320, msg="RunExecutionPlan create session(b0abbc6f-7cbd-11ef-ad09-0242ac180003) failed, catch std::exception=[external/yacl/yacl/link/transport/ debug_opts:{} job_config:{}

friendsAI avatar Sep 27 '24 10:09 friendsAI

尝试将 link_recv_timeout_ms 调大些试下;

lanyy9527 avatar Sep 29 '24 02:09 lanyy9527

好的,我试一下。我看官方文档里说,这个默认值是30000,单位是毫秒嘛?

friendsAI avatar Sep 29 '24 02:09 friendsAI

好的,我试一下。我看官方文档里说,这个默认值是30000,单位是毫秒嘛?

是的,默认30s

lanyy9527 avatar Sep 29 '24 02:09 lanyy9527

image

我执行run 语句,它怎么报错的信息,是另一条sql 语句啊?这是你们预置的一条sql吗?

friendsAI avatar Sep 29 '24 03:09 friendsAI

查看下你创建数据表的命令,是否有指定使用这张表;

lanyy9527 avatar Sep 29 '24 03:09 lanyy9527

image 并没有。这个项目里都没有这张表

friendsAI avatar Sep 29 '24 03:09 friendsAI

贴下 create table for bob 的命令看看

lanyy9527 avatar Sep 29 '24 05:09 lanyy9527

哦,我查了,是我的问题。谢谢。超时哪个设置,我测试完了再回复,看设置link_recv_timeout_ms 是否能起作用。谢谢!

friendsAI avatar Sep 29 '24 06:09 friendsAI

Stale issue message. Please comment to remove stale tag. Otherwise this issue will be closed soon.

github-actions[bot] avatar Oct 29 '24 09:10 github-actions[bot]