scql
scql copied to clipboard
scql执行run命令 timeout参数设置不起作用
Issue Type
Running
Have you searched for existing issues?
Yes
OS Platform and Distribution
linux v10
SCQL Version
0.8.1b
What happend and What you expected to happen.
执行run 命令,设置 --timeout 参数不起作用。设置为500秒,很快就报超时,任务没有执行成功。请问,--timeout 参数,是不是有个默认的上限值,为什么没有按照用户实际设置的来?
Configuration used to run SCQL.
报超时错误。
SCQL log output.
报超时错误。
你好,可以贴一下执行命令和日志的信息
命令如下: ./brokerctl run "select cloud2.ID,cloud2.name,cloud2.cardno from cloud2 inner join ga on ga.ID=cloud2.ID;" --project-id "9ff175c9ca1f49bca5f363a88f4ffbd7d" --host "http://192.168.90.171:8080" --timeout 500 broker 部分日志如下: 2024-09-27 02:54:49.9272 ERROR executor.go:98 |RequestID:|SessionID:bad8437a-7c7b-11ef-a9b6-0242ac170002|ActionName:EngineStub@RunExecutionPlan|CostTime:54.583204491s|Reason:InvalidResponse|ErrorMsg:Error: code=320, msg="RunExecutionPlan create session(bad8437a-7c7b-11ef-a9b6-0242ac170002) failed, catch std::exception=[external/yacl/yacl/link/transport/channel.cc:427] Get data timeout, key=bad8437a-7c7b-11ef-a9b6-0242ac170002:2:ALLGATHER "|Request: 2024-09-27 02:54:49.9272 ERROR common.go:139 |RequestID:|RequestParty:|SessionID:|ActionName:Intra@DoQuery|CostTime:56.623403125s|Reason:|ErrorMsg:runQuery Execute err: Error: code=320, msg="RunExecutionPlan create session(bad8437a-7c7b-11ef-a9b6-0242ac170002) failed, catch std::exception=[external/yacl/yacl/link/transport/channel.cc:427] Get data timeout, key=bad8437a-7c7b-11ef-a9b6-0242ac170002:2:ALLGATHER "|Request:project_id:"9ff175c9ca1f49bca5f363a88f4ffbd7d" query:"select cloud2.ID,cloud2.name,cloud2.cardno from cloud2 inner join ga on ga.ID=cloud2.ID;" debug_opts:{} job_config:{} 2024-09-27 02:54:49.9272 INFO server.go:135 |GIN|status=200|method=POST|path=/intra/query|ip=192.168.90.171|latency=56.623593684s| engine 部分日志如下: 2024-09-27 02:53:07.267 [info] [engine_service_impl.cc:RunPlanSync:571] [job(6fd502f0-7c7b-11ef-a9b6-0242ac170002)] RunExecutionPlan success, sessionID=6fd502f0-7c7b-11ef-a9b6-0242ac170002 2024-09-27 02:53:15.927 [info] [session_manager.cc:RemoveSession:226] [scqlengine] session(6fd502f0-7c7b-11ef-a9b6-0242ac170002) removed, running_cost(86796ms), current running session=0 2024-09-27 02:53:15.928 [warning] [session_manager.cc:GetSession:156] [scqlengine] session(6fd502f0-7c7b-11ef-a9b6-0242ac170002) not exists. default return nullptr. 2024-09-27 02:53:15.928 [warning] [session.cc:ActiveLogger:336] [scqlengine] can not get valid session 2024-09-27 02:53:15.928 [info] [engine_service_impl.cc:StopJob:173] [scqlengine] EngineServiceImpl::StopJob(6fd502f0-7c7b-11ef-a9b6-0242ac170002), reason() 2024-09-27 02:53:15.928 [warning] [session_manager.cc:StopSession:174] [scqlengine] session(6fd502f0-7c7b-11ef-a9b6-0242ac170002) not exists. 2024-09-27 02:53:54.428 [warning] [listener.cc:GetListener:62] [scqlengine] Listener for link_id:bad8437a-7c7b-11ef-a9b6-0242ac170002 not exist. 2024-09-27 02:54:47.525 [warning] [session_manager.cc:GetSession:156] [scqlengine] session(bad8437a-7c7b-11ef-a9b6-0242ac170002) not exists. default return nullptr. 2024-09-27 02:54:47.539 [error] [engine_service_impl.cc:RunExecutionPlan:301] [scqlengine] RunExecutionPlan create session(bad8437a-7c7b-11ef-a9b6-0242ac170002) failed, catch std::exception=[external/yacl/yacl/link/transport/channel.cc:427] Get data timeout, key=bad8437a-7c7b-11ef-a9b6-0242ac170002:2:ALLGATHER 2024-09-27 02:54:47.540 [warning] [session_manager.cc:GetSession:156] [scqlengine] session(bad8437a-7c7b-11ef-a9b6-0242ac170002) not exists. default return nullptr. 2024-09-27 02:54:47.540 [warning] [session.cc:ActiveLogger:336] [scqlengine] can not get valid session 2024-09-27 02:54:49.553 [warning] [engine_service_impl.cc:ReportErrorToPeers:661] [scqlengine] sync error to peer=(1200001725521632516,192.168.90.171:5006) failed: status: code: 141 message: "no session for job_id=bad8437a-7c7b-11ef-a9b6-0242ac170002"
2024-09-27 02:54:49.555 [warning] [session_manager.cc:GetSession:156] [scqlengine] session(bad8437a-7c7b-11ef-a9b6-0242ac170002) not exists. default return nullptr. 2024-09-27 02:54:49.555 [warning] [session.cc:ActiveLogger:336] [scqlengine] can not get valid session 2024-09-27 02:54:49.555 [info] [engine_service_impl.cc:StopJob:173] [scqlengine] EngineServiceImpl::StopJob(bad8437a-7c7b-11ef-a9b6-0242ac170002), reason() 2024-09-27 02:54:49.555 [warning] [session_manager.cc:StopSession:174] [scqlengine] session(bad8437a-7c7b-11ef-a9b6-0242ac170002) not exists.
您的数据量是多大的?方便提供下两台机器的mem,另外两台机器之间的带宽延迟有没有做限制?
两边数据量很小,都不到100条。机器的内存是64G。scql有设置带宽延迟的参数嘛?我没有在其它的地方限制这个。
那有可能是mysql和engine连接的问题,可以贴下你的gflags.conf和docker-compose配置文件信息;
gflags.conf 文件配置如下: --listen_port=8003 --datasource_router=embed --enable_driver_authorization=false --server_enable_ssl=false --driver_enable_ssl_as_client=false --peer_engine_enable_ssl_as_client=false --embed_router_conf={"datasources": [{"id": "ds001", "name": "mysql db", "kind": "MYSQL", "connection_str": "db=bob;user=root;password=123456;host=192.168.90.171;auto-reconnect=true"}], "rules": [{"db": "", "table": "", "datasource_id": "ds001"}]} --enable_self_auth=false --enable_peer_auth=false --peer_engine_protocol=http:proto --peer_engine_connection_type=pooled --spu_allowed_protocols=CHEETAH
docker-compose.yaml文件配置信息如下: version: '3.8' services: broker: image: secretflow-registry.cn-hangzhou.cr.aliyuncs.com/secretflow/scql:latest command: - /home/admin/bin/broker - -config=/home/admin/configs/config.yml restart: always ports: - 8080:8080 - 8081:8081 volumes: - ./config.yml:/home/admin/configs/config.yml - ./party_info.json:/home/admin/configs/party_info.json - ./ed25519key.pem:/home/admin/configs/ed25519key.pem security_opt: - seccomp:unconfined engine: cap_add: - NET_ADMIN command: - /home/admin/bin/scqlengine - --flagfile=/home/admin/engine/conf/gflags.conf image: secretflow-registry.cn-hangzhou.cr.aliyuncs.com/secretflow/scql:latest ports: - 8003:8003 volumes: - ./gflags.conf:/home/admin/engine/conf/gflags.conf security_opt: - seccomp:unconfined
可以尝试下在connection_str中添加mysql port信息,重启服务监控下 docker logs engine-name 日志信息
engine 信息如下: 2024-09-27 10:43:16.686 [info] [main.cc:BuildRouter:108] [scqlengine] Building EmbedRouter from json conf 2024-09-27 10:43:16.687 [info] [session_manager.cc:WatchSessionTimeoutThread:250] [scqlengine] WatchSessionTimeoutThread startup, session default timeout=1800s 2024-09-27 10:43:16.687 [info] [thread_pool.cc:ThreadPool:30] [scqlengine] Create a fixed thread pool with size 32 2024-09-27 10:43:16.688 [info] [main.cc:main:330] [scqlengine] Adding EngineService into brpc server 2024-09-27 10:43:16.689 [info] [main.cc:main:339] [scqlengine] Adding ErrorCollectorService into brpc server 2024-09-27 10:43:16.689 [info] [main.cc:main:348] [scqlengine] Adding MetricsService into brpc server 2024-09-27 10:43:16.689 [info] [main.cc:main:358] [scqlengine] Adding MuxReceiverService into main server... 2024-09-27 10:43:16.692 [warning] [server.cpp:BRPC:1187] [scqlengine] Builtin services are disabled according to ServerOptions.has_builtin_services 2024-09-27 10:43:16.693 [info] [main.cc:main:378] [scqlengine] Started engine rpc server success, listen on: 0.0.0.0:8003
还是执行以前的run命令,仍旧执行不到500秒,报错。 Error: run query: DoQuery response: { "status": { "code": 320, "message": "RunExecutionPlan create session(b0abbc6f-7cbd-11ef-ad09-0242ac180003) failed, catch std::exception=[external/yacl/yacl/link/transport/channel.cc:427] Get data timeout, key=b0abbc6f-7cbd-11ef-ad09-0242ac180003:2:ALLGATHER " } }
engine的日志: 2024-09-27 10:43:16.686 [info] [main.cc:BuildRouter:108] [scqlengine] Building EmbedRouter from json conf 2024-09-27 10:43:16.687 [info] [session_manager.cc:WatchSessionTimeoutThread:250] [scqlengine] WatchSessionTimeoutThread startup, session default timeout=1800s 2024-09-27 10:43:16.687 [info] [thread_pool.cc:ThreadPool:30] [scqlengine] Create a fixed thread pool with size 32 2024-09-27 10:43:16.688 [info] [main.cc:main:330] [scqlengine] Adding EngineService into brpc server 2024-09-27 10:43:16.689 [info] [main.cc:main:339] [scqlengine] Adding ErrorCollectorService into brpc server 2024-09-27 10:43:16.689 [info] [main.cc:main:348] [scqlengine] Adding MetricsService into brpc server 2024-09-27 10:43:16.689 [info] [main.cc:main:358] [scqlengine] Adding MuxReceiverService into main server... 2024-09-27 10:43:16.692 [warning] [server.cpp:BRPC:1187] [scqlengine] Builtin services are disabled according to ServerOptions.has_builtin_services 2024-09-27 10:43:16.693 [info] [main.cc:main:378] [scqlengine] Started engine rpc server success, listen on: 0.0.0.0:8003 2024-09-27 10:46:03.647 [warning] [listener.cc:GetListener:62] [scqlengine] Listener for link_id:b0abbc6f-7cbd-11ef-ad09-0242ac180003 not exist. 2024-09-27 10:46:58.754 [warning] [session_manager.cc:GetSession:156] [scqlengine] session(b0abbc6f-7cbd-11ef-ad09-0242ac180003) not exists. default return nullptr. 2024-09-27 10:46:58.757 [error] [engine_service_impl.cc:RunExecutionPlan:301] [scqlengine] RunExecutionPlan create session(b0abbc6f-7cbd-11ef-ad09-0242ac180003) failed, catch std::exception=[external/yacl/yacl/link/transport/channel.cc:427] Get data timeout, key=b0abbc6f-7cbd-11ef-ad09-0242ac180003:2:ALLGATHER 2024-09-27 10:46:58.758 [warning] [session_manager.cc:GetSession:156] [scqlengine] session(b0abbc6f-7cbd-11ef-ad09-0242ac180003) not exists. default return nullptr. 2024-09-27 10:46:58.758 [warning] [session.cc:ActiveLogger:336] [scqlengine] can not get valid session 2024-09-27 10:47:00.771 [warning] [engine_service_impl.cc:ReportErrorToPeers:661] [scqlengine] sync error to peer=(1200001725521632516,192.168.90.171:5006) failed: status: code: 141 message: "no session for job_id=b0abbc6f-7cbd-11ef-ad09-0242ac180003" broker 日志如下: 2024-09-27 10:47:00.773 [warning] [session_manager.cc:GetSession:156] [scqlengine] session(b0abbc6f-7cbd-11ef-ad09-0242ac180003) not exists. default return nullptr. 2024-09-27 10:47:00.773 [warning] [session.cc:ActiveLogger:336] [scqlengine] can not get valid session 2024-09-27 10:47:00.773 [info] [engine_service_impl.cc:StopJob:173] [scqlengine] EngineServiceImpl::StopJob(b0abbc6f-7cbd-11ef-ad09-0242ac180003), reason() 2024-09-27 10:47:00.773 [warning] [session_manager.cc:StopSession:174] [scqlengine] session(b0abbc6f-7cbd-11ef-ad09-0242ac180003) not exists.
2024-09-27 10:47:00.92710 ERROR executor.go:98 |RequestID:|SessionID:b0abbc6f-7cbd-11ef-ad09-0242ac180003|ActionName:EngineStub@RunExecutionPlan|CostTime:56.08233881s|Reason:InvalidResponse|ErrorMsg:Error: code=320, msg="RunExecutionPlan create session(b0abbc6f-7cbd-11ef-ad09-0242ac180003) failed, catch std::exception=[external/yacl/yacl/link/transport/channel.cc:427] Get data timeout, key=b0abbc6f-7cbd-11ef-ad09-0242ac180003:2:ALLGATHER "|Request: 2024-09-27 10:47:00.92710 ERROR common.go:139 |RequestID:|RequestParty:|SessionID:|ActionName:Intra@DoQuery|CostTime:58.126842499s|Reason:|ErrorMsg:runQuery Execute err: Error: code=320, msg="RunExecutionPlan create session(b0abbc6f-7cbd-11ef-ad09-0242ac180003) failed, catch std::exception=[external/yacl/yacl/link/transport/ debug_opts:{} job_config:{}
尝试将 link_recv_timeout_ms 调大些试下;
好的,我试一下。我看官方文档里说,这个默认值是30000,单位是毫秒嘛?
好的,我试一下。我看官方文档里说,这个默认值是30000,单位是毫秒嘛?
是的,默认30s
我执行run 语句,它怎么报错的信息,是另一条sql 语句啊?这是你们预置的一条sql吗?
查看下你创建数据表的命令,是否有指定使用这张表;
并没有。这个项目里都没有这张表
贴下 create table for bob 的命令看看
哦,我查了,是我的问题。谢谢。超时哪个设置,我测试完了再回复,看设置link_recv_timeout_ms 是否能起作用。谢谢!
Stale issue message. Please comment to remove stale tag. Otherwise this issue will be closed soon.