secretpad
secretpad copied to clipboard
两方隐私求交执行失败
- 根据https://www.secretflow.org.cn/docs/quickstart/zgnd8oqo5chsqhzm部署了中心化模式的center和两个独立节点,分别位于三台不同IP的虚拟机上,节点在平台上注册正常,节点间合作授权正常,数据上传和项目授权正常
- 项目中如图建立了两方隐私求交的训练流,并运行。结果在执行隐私求交节点时报错:
2024-02-04 18:15:10 INFO the jobId=pusf, taskId=pusf-skadyppw-node-2 start ...
2024-02-04 18:15:52 INFO the jobId=pusf, taskId=pusf-skadyppw-node-2 failed: party ehovblga failed msg: container[secretflow] terminated state reason "Error", message: "y.cc:LogHttpDetail:29] cntl ErrorCode '1010', http status code '503', response header '[x-b3-traceid]:[752c6e5680cd42c1];[content-length]:[145];[kuscia-error-message]:[Domain ehovblga.nsfocus-kuscia-lite-ehovblga<--Domain yrjcaulc.nsfocus-kuscia-lite-yrjcaulc<--192.168.19.83 return http code 503.];[x-accel-buffering]:[no];[x-b3-spanid]:[752c6e5680cd42c1];[x-envoy-upstream-service-time]:[4];[date]:[Sun, 04 Feb 2024 10:15:43 GMT];[server]:[envoy];', error msg '[E1010]HTTP/1.1 503 Service Unavailable: upstream connect error or disconnect/reset before headers. reset reason: connection failure, transport failure reason: delayed connect error: 111'
\x1b[2m\x1b[36m(SPURuntime pid=629)\x1b[0m 2024-02-04 10:15:43.864 [info] [default_brpc_retry_policy.cc:DoRetry:75] aggressive retry, sleep=1000000us and retry
\x1b[2m\x1b[36m(SenderReceiverProxyActor pid=493)\x1b[0m 2024-02-04 10:15:45 DEBUG link.py:104 [ehovblga] -- Received data for ping from ping.
\x1b[2m\x1b[36m(SPURuntime(device_id=None, party=ehovblga) pid=629)\x1b[0m 2024-02-04 10:15:44.923 [info] [bucket_psi.cc:Init:315] bucket size set to 1048576
\x1b[2m\x1b[36m(SPURuntime(device_id=None, party=ehovblga) pid=629)\x1b[0m 2024-02-04 10:15:44.924 [info] [bucket_psi.cc:CheckInput:229] Begin sanity check for input file: /home/kuscia/var/storage/data/data1_1998032270.csv, precheck_switch:true
\x1b[2m\x1b[36m(SPURuntime(device_id=None, party=ehovblga) pid=629)\x1b[0m 2024-02-04 10:15:44.925 [info] [csv_checker.cc:CsvChecker:121] Executing duplicated scripts: LC_ALL=C sort --buffer-size=1G --temporary-directory=/home/kuscia/var/storage/data --stable selected-keys.1707041744924974630 | LC_ALL=C uniq -d > duplicate-keys.1707041744924974630
\x1b[2m\x1b[36m(SPURuntime(device_id=None, party=ehovblga) pid=629)\x1b[0m 2024-02-04 10:15:45.022 [info] [bucket_psi.cc:CheckInput:246] End sanity check for input file: /home/kuscia/var/storage/data/data1_1998032270.csv, size=20
\x1b[2m\x1b[36m(SPURuntime(device_id=None, party=ehovblga) pid=629)\x1b[0m 2024-02-04 10:15:45.036 [info] [bucket_psi.cc:RunPsi:348] Run psi protocol=1, self_items_count=20
"
2024-02-04 18:15:52 INFO the jobId=pusf, taskId=pusf-skadyppw-node-2 failed: The remaining no-failed party task counts 1 are less than the threshold 2 that meets the conditions for task success. pending party[], running party[yrjcaulc], successful party[], failed party[ehovblga]
3. 节点配置
hi 请在下面的cmd中输入两条命令看下输出
- uname -a
- cat /proc/cpuinfo | grep avx