kuscia
kuscia copied to clipboard
直接调用ss_compare组件失败
Issue Type
Bug
Deployment
docker
Kuscia Version
0.8.0.dev240409
OS Platform and Distribution
Ubuntu 22.04 (WSL)
Docker version
No response
K8s version
No response
App Running type
secretflow
App Running version
secretflow 1.5.0.dev20240304
Configuration file used to run kuscia.
# alice kuscia.yaml
mode: lite
domainID: alice
domainKeyData:
...
logLevel: INFO
liteDeployToken: iTcrIJrJiavapT8tsSD7qp5JChuB2fJ7
masterEndpoint: https://172.27.39.116:18080
runtime: runc
runk:
namespace: ""
dnsServers: []
kubeconfigFile: ""
capacity:
cpu: ""
memory: ""
pods: ""
storage: ""
image:
pullPolicy: ""
defaultRegistry: ""
registries: []
# bob kuscia.yaml
mode: lite
domainID: bob
domainKeyData:
...
logLevel: INFO
liteDeployToken: ncP44FGpJsDHCDSlMJlzwgUFRih95vgY
masterEndpoint: https://172.27.39.116:18080
runtime: runc
runk:
namespace: ""
dnsServers: []
kubeconfigFile: ""
capacity:
cpu: ""
memory: ""
pods: ""
storage: ""
image:
pullPolicy: ""
defaultRegistry: ""
registries: []
What happend and What you expected to happen.
我根据这个网页构建了ss_conpare组件:https://www.secretflow.org.cn/zh-CN/docs/secretpad-all-in-one/latest/mfbgum8vi3ngs4y9。然后尝试直接通过kuscia的接口调用该组件。
但是sf_input_ids这里经过各种尝试都在报错,填入的信息和数据注册时一致也不行,主要错误是cnt of sf_input_ids doesn't match cnt of comp_def.inputs.传入参数如下:
{
"status": {
"code": 0,
"message": "success",
"details": [
]
},
"data": {
"job_id": "job-best-effort-compare17",
"initiator": "alice",
"max_parallelism": 2,
"tasks": [
{
"app_image": "sun-image",
"parties": [
{
"domain_id": "alice",
"role": ""
},
{
"domain_id": "bob",
"role": ""
}
],
"alias": "job-compare17",
"task_id": "job-compare17",
"dependencies": [
],
"task_input_config": "{\"sf_datasource_config\":{\"alice\":{\"id\":\"default-data-source\"},\"bob\":{\"id\":\"default-data-source\"}},\"sf_cluster_desc\":{\"parties\":[\"alice\",\"bob\"],\"devices\":[{\"name\":\"spu\",\"type\":\"spu\",\"parties\":[\"alice\",\"bob\"],\"config\":\"{\\\"runtime_config\\\":{\\\"protocol\\\":\\\"REF2K\\\",\\\"field\\\":\\\"FM64\\\"},\\\"link_desc\\\":{\\\"connect_retry_times\\\":60,\\\"connect_retry_interval_ms\\\":1000,\\\"brpc_channel_protocol\\\":\\\"http\\\",\\\"brpc_channel_connection_type\\\":\\\"pooled\\\",\\\"recv_timeout_ms\\\":1200000,\\\"http_timeout_ms\\\":1200000}}\"},{\"name\":\"heu\",\"type\":\"heu\",\"parties\":[\"alice\",\"bob\"],\"config\":\"{\\\"mode\\\": \\\"PHEU\\\", \\\"schema\\\": \\\"paillier\\\", \\\"key_size\\\": 2048}\"}],\"ray_fed_config\":{\"cross_silo_comm_backend\":\"brpc_link\"}},\"sf_node_eval_param\":{\"domain\":\"user\",\"name\":\"ss_compare\",\"version\":\"0.0.1\",\"attr_paths\":[\"input_table/alice_value/key\",\"input_table/bob_value/key\",\"tolerance\"],\"attrs\":[{\"ss\":[\"deposit_alice\"]},{\"ss\":[\"deposit_bob\"]},{\"i64\":\"10\"}]},\"sf_input_ids\":[\"alice-bank\",\"bob-bank\"],\"sf_output_ids\":[\"alice-output\",\"bob-output\"],\"sf_output_uris\":[\"alice-output.csv\",\"bob-output.csv\"]}",
"priority": 100
}
],
"status": {
"state": "Failed",
"err_msg": "",
"create_time": "2024-05-13T08:43:21Z",
"start_time": "2024-05-13T08:43:21Z",
"end_time": "2024-05-13T08:46:23Z",
"tasks": [
{
"task_id": "job-compare17",
"state": "Failed",
"err_msg": "The remaining no-failed party task counts 1 are less than the threshold 2 that meets the conditions for task success. pending party[], running party[bob], successful party[], failed party[alice]",
"create_time": "2024-05-13T08:43:21Z",
"start_time": "2024-05-13T08:43:21Z",
"end_time": "2024-05-13T08:46:23Z",
"parties": [
{
"domain_id": "bob",
"state": "Failed",
"err_msg": "",
"endpoints": [
{
"port_name": "fed",
"scope": "Cluster",
"endpoint": "job-compare17-0-fed.bob.svc"
},
{
"port_name": "global",
"scope": "Domain",
"endpoint": "job-compare17-0-global.bob.svc:23729"
},
{
"port_name": "spu",
"scope": "Cluster",
"endpoint": "job-compare17-0-spu.bob.svc"
}
]
},
{
"domain_id": "alice",
"state": "Failed",
"err_msg": "container[secretflow] terminated state reason \"Error\", message: \"WARNING:root:Since the GPL-licensed package `unidecode` is not installed, using Python's `unicodedata` package which yields worse results.\\n2024-05-13 08:45:05,879|alice|INFO|secretflow|entry.py:start_ray:59| ray_conf: RayConfig(ray_node_ip_address='job-compare17-0-global.alice.svc', ray_node_manager_port=0, ray_object_manager_port=0, ray_client_server_port=0, ray_worker_ports=[], ray_gcs_port=24049)\\n2024-05-13 08:45:05,880|alice|INFO|secretflow|entry.py:start_ray:63| Trying to start ray head node at job-compare17-0-global.alice.svc, start command: RAY_BACKEND_LOG_LEVEL=debug RAY_grpc_enable_http_proxy=true OMP_NUM_THREADS=12 ray start --head --include-dashboard=false --disable-usage-stats --num-cpus=32 --node-ip-address=job-compare17-0-global.alice.svc --port=24049\\n2024-05-13 08:45:45,836|alice|INFO|secretflow|entry.py:start_ray:80| 2024-05-13 08:45:09,525\\tINFO usage_lib.py:423 -- Usage stats collection is disabled.\\n2024-05-13 08:45:09,525\\tINFO scripts.py:744 -- Local node IP: job-compare17-0-global.alice.svc\\n2024-05-13 08:45:40,447\\tSUCC scripts.py:781 -- --------------------\\n2024-05-13 08:45:40,447\\tSUCC scripts.py:782 -- Ray runtime started.\\n2024-05-13 08:45:40,448\\tSUCC scripts.py:783 -- --------------------\\n2024-05-13 08:45:40,448\\tINFO scripts.py:785 -- Next steps\\n2024-05-13 08:45:40,448\\tINFO scripts.py:788 -- To add another node to this Ray cluster, run\\n2024-05-13 08:45:40,448\\tINFO scripts.py:791 -- ray start --address='job-compare17-0-global.alice.svc:24049'\\n2024-05-13 08:45:40,448\\tINFO scripts.py:800 -- To connect to this Ray cluster:\\n2024-05-13 08:45:40,448\\tINFO scripts.py:802 -- import ray\\n2024-05-13 08:45:40,448\\tINFO scripts.py:803 -- ray.init(_node_ip_address='job-compare17-0-global.alice.svc')\\n2024-05-13 08:45:40,448\\tINFO scripts.py:834 -- To terminate the Ray runtime, run\\n2024-05-13 08:45:40,448\\tINFO scripts.py:835 -- ray stop\\n2024-05-13 08:45:40,448\\tINFO scripts.py:838 -- To view the status of the cluster, use\\n2024-05-13 08:45:40,448\\tINFO scripts.py:839 -- ray status\\n\\n2024-05-13 08:45:45,836|alice|INFO|secretflow|entry.py:start_ray:81| Succeeded to start ray head node at job-compare17-0-global.alice.svc.\\n2024-05-13 08:45:45,837|alice|INFO|secretflow|entry.py:main:510| datasource.access_directly True\\nsf_node_eval_param {\\n \\\"domain\\\": \\\"user\\\",\\n \\\"name\\\": \\\"ss_compare\\\",\\n \\\"version\\\": \\\"0.0.1\\\",\\n \\\"attrPaths\\\": [\\n \\\"input_table/alice_value/key\\\",\\n \\\"input_table/bob_value/key\\\",\\n \\\"tolerance\\\"\\n ],\\n \\\"attrs\\\": [\\n {\\n \\\"ss\\\": [\\n \\\"deposit_alice\\\"\\n ]\\n },\\n {\\n \\\"ss\\\": [\\n \\\"deposit_bob\\\"\\n ]\\n },\\n {\\n \\\"i64\\\": \\\"10\\\"\\n }\\n ]\\n} \\nTraceback (most recent call last):\\n File \\\"/usr/local/lib/python3.10/runpy.py\\\", line 196, in _run_module_as_main\\n return _run_code(code, main_globals, None,\\n File \\\"/usr/local/lib/python3.10/runpy.py\\\", line 86, in _run_code\\n exec(code, run_globals)\\n File \\\"/usr/local/lib/python3.10/site-packages/secretflow/kuscia/entry.py\\\", line 547, in <module>\\n main()\\n File \\\"/usr/local/lib/python3.10/site-packages/click/core.py\\\", line 1157, in __call__\\n return self.main(*args, **kwargs)\\n File \\\"/usr/local/lib/python3.10/site-packages/click/core.py\\\", line 1078, in main\\n rv = self.invoke(ctx)\\n File \\\"/usr/local/lib/python3.10/site-packages/click/core.py\\\", line 1434, in invoke\\n return ctx.invoke(self.callback, **ctx.params)\\n File \\\"/usr/local/lib/python3.10/site-packages/click/core.py\\\", line 783, in invoke\\n return __callback(*args, **kwargs)\\n File \\\"/usr/local/lib/python3.10/site-packages/secretflow/kuscia/entry.py\\\", line 514, in main\\n sf_node_eval_param = preprocess_sf_node_eval_param(\\n File \\\"/usr/local/lib/python3.10/site-packages/secretflow/kuscia/entry.py\\\", line 273, in preprocess_sf_node_eval_param\\n assert len(comp_def.inputs) == len(\\nAssertionError: cnt of sf_input_ids doesn't match cnt of comp_def.inputs.\\n\"",
"endpoints": [
{
"port_name": "fed",
"scope": "Cluster",
"endpoint": "job-compare17-0-fed.alice.svc"
},
{
"port_name": "global",
"scope": "Domain",
"endpoint": "job-compare17-0-global.alice.svc:24049"
},
{
"port_name": "spu",
"scope": "Cluster",
"endpoint": "job-compare17-0-spu.alice.svc"
}
]
}
]
}
],
"stage_status_list": [
{
"domain_id": "alice",
"state": "JobCreateStageSucceeded"
},
{
"domain_id": "bob",
"state": "JobCreateStageSucceeded"
}
],
"approve_status_list": [
{
"domain_id": "alice",
"state": "JobAccepted"
},
{
"domain_id": "bob",
"state": "JobAccepted"
}
]
},
"custom_fields": {
}
}
}
Kuscia log output.
出错的日志如下
# alice kuscia.log
2024-05-13 16:46:14.056 INFO status/status_manager.go:625 Patch status for pod "job-compare17-0_alice(ddc7c837-1cb5-4cf1-8e03-d3cb1e43173d)", patch={"metadata":{"uid":"ddc7c837-1cb5-4cf1-8e03-d3cb1e43173d"},"status":{"$setElementOrder/conditions":[{"type":"Initialized"},{"type":"Ready"},{"type":"ContainersReady"},{"type":"PodScheduled"}],"conditions":[{"lastTransitionTime":"2024-05-13T08:46:13Z","reason":"PodFailed","status":"False","type":"Ready"},{"lastTransitionTime":"2024-05-13T08:46:13Z","reason":"PodFailed","status":"False","type":"ContainersReady"}],"containerStatuses":[{"containerID":"containerd://0ba5e0b1a55f44d5a4d9029dea216e4c0e9e055c9deb17c8e1cd78b5a7c26663","image":"docker.io/secretflow/sf-dev-anolis8:cmp","imageID":"sha256:57020aff9a5bd6f3973c310628c72fc231e254e278df5786917330ea6ea7fc74","lastState":{},"name":"secretflow","ready":false,"restartCount":0,"started":false,"state":{"terminated":{"containerID":"containerd://0ba5e0b1a55f44d5a4d9029dea216e4c0e9e055c9deb17c8e1cd78b5a7c26663","exitCode":1,"finishedAt":"2024-05-13T08:46:08Z","message":"WARNING:root:Since the GPL-licensed package `unidecode` is not installed, using Python's `unicodedata` package which yields worse results.\n2024-05-13 08:45:05,879|alice|INFO|secretflow|entry.py:start_ray:59| ray_conf: RayConfig(ray_node_ip_address='job-compare17-0-global.alice.svc', ray_node_manager_port=0, ray_object_manager_port=0, ray_client_server_port=0, ray_worker_ports=[], ray_gcs_port=24049)\n2024-05-13 08:45:05,880|alice|INFO|secretflow|entry.py:start_ray:63| Trying to start ray head node at job-compare17-0-global.alice.svc, start command: RAY_BACKEND_LOG_LEVEL=debug RAY_grpc_enable_http_proxy=true OMP_NUM_THREADS=12 ray start --head --include-dashboard=false --disable-usage-stats --num-cpus=32 --node-ip-address=job-compare17-0-global.alice.svc --port=24049\n2024-05-13 08:45:45,836|alice|INFO|secretflow|entry.py:start_ray:80| 2024-05-13 08:45:09,525\tINFO usage_lib.py:423 -- Usage stats collection is disabled.\n2024-05-13 08:45:09,525\tINFO scripts.py:744 -- Local node IP: job-compare17-0-global.alice.svc\n2024-05-13 08:45:40,447\tSUCC scripts.py:781 -- --------------------\n2024-05-13 08:45:40,447\tSUCC scripts.py:782 -- Ray runtime started.\n2024-05-13 08:45:40,448\tSUCC scripts.py:783 -- --------------------\n2024-05-13 08:45:40,448\tINFO scripts.py:785 -- Next steps\n2024-05-13 08:45:40,448\tINFO scripts.py:788 -- To add another node to this Ray cluster, run\n2024-05-13 08:45:40,448\tINFO scripts.py:791 -- ray start --address='job-compare17-0-global.alice.svc:24049'\n2024-05-13 08:45:40,448\tINFO scripts.py:800 -- To connect to this Ray cluster:\n2024-05-13 08:45:40,448\tINFO scripts.py:802 -- import ray\n2024-05-13 08:45:40,448\tINFO scripts.py:803 -- ray.init(_node_ip_address='job-compare17-0-global.alice.svc')\n2024-05-13 08:45:40,448\tINFO scripts.py:834 -- To terminate the Ray runtime, run\n2024-05-13 08:45:40,448\tINFO scripts.py:835 -- ray stop\n2024-05-13 08:45:40,448\tINFO scripts.py:838 -- To view the status of the cluster, use\n2024-05-13 08:45:40,448\tINFO scripts.py:839 -- ray status\n\n2024-05-13 08:45:45,836|alice|INFO|secretflow|entry.py:start_ray:81| Succeeded to start ray head node at job-compare17-0-global.alice.svc.\n2024-05-13 08:45:45,837|alice|INFO|secretflow|entry.py:main:510| datasource.access_directly True\nsf_node_eval_param {\n \"domain\": \"user\",\n \"name\": \"ss_compare\",\n \"version\": \"0.0.1\",\n \"attrPaths\": [\n \"input_table/alice_value/key\",\n \"input_table/bob_value/key\",\n \"tolerance\"\n ],\n \"attrs\": [\n {\n \"ss\": [\n \"deposit_alice\"\n ]\n },\n {\n \"ss\": [\n \"deposit_bob\"\n ]\n },\n {\n \"i64\": \"10\"\n }\n ]\n} \nTraceback (most recent call last):\n File \"/usr/local/lib/python3.10/runpy.py\", line 196, in _run_module_as_main\n return _run_code(code, main_globals, None,\n File \"/usr/local/lib/python3.10/runpy.py\", line 86, in _run_code\n exec(code, run_globals)\n File \"/usr/local/lib/python3.10/site-packages/secretflow/kuscia/entry.py\", line 547, in \u003cmodule\u003e\n main()\n File \"/usr/local/lib/python3.10/site-packages/click/core.py\", line 1157, in __call__\n return self.main(*args, **kwargs)\n File \"/usr/local/lib/python3.10/site-packages/click/core.py\", line 1078, in main\n rv = self.invoke(ctx)\n File \"/usr/local/lib/python3.10/site-packages/click/core.py\", line 1434, in invoke\n return ctx.invoke(self.callback, **ctx.params)\n File \"/usr/local/lib/python3.10/site-packages/click/core.py\", line 783, in invoke\n return __callback(*args, **kwargs)\n File \"/usr/local/lib/python3.10/site-packages/secretflow/kuscia/entry.py\", line 514, in main\n sf_node_eval_param = preprocess_sf_node_eval_param(\n File \"/usr/local/lib/python3.10/site-packages/secretflow/kuscia/entry.py\", line 273, in preprocess_sf_node_eval_param\n assert len(comp_def.inputs) == len(\nAssertionError: cnt of sf_input_ids doesn't match cnt of comp_def.inputs.\n","reason":"Error","startedAt":"2024-05-13T08:43:32Z"}}}]}}
# bob kuscia.log
2024-05-13 16:46:25.908 INFO status/status_manager.go:625 Patch status for pod "job-compare17-0_bob(1cb33857-f297-4262-b857-782edf209123)", patch={"metadata":{"uid":"1cb33857-f297-4262-b857-782edf209123"},"status":{"$setElementOrder/conditions":[{"type":"Initialized"},{"type":"Ready"},{"type":"ContainersReady"},{"type":"PodScheduled"}],"conditions":[{"lastTransitionTime":"2024-05-13T08:46:25Z","reason":"PodFailed","status":"False","type":"Ready"},{"lastTransitionTime":"2024-05-13T08:46:25Z","reason":"PodFailed","status":"False","type":"ContainersReady"}],"containerStatuses":[{"containerID":"containerd://26accb69b7457e6430d024a662a0c57895555e0b44d116cbb3a2c9af1f9bb092","image":"docker.io/secretflow/sf-dev-anolis8:cmp","imageID":"sha256:57020aff9a5bd6f3973c310628c72fc231e254e278df5786917330ea6ea7fc74","lastState":{},"name":"secretflow","ready":false,"restartCount":0,"started":false,"state":{"terminated":{"containerID":"containerd://26accb69b7457e6430d024a662a0c57895555e0b44d116cbb3a2c9af1f9bb092","exitCode":143,"finishedAt":"2024-05-13T08:46:23Z","message":"WARNING:root:Since the GPL-licensed package `unidecode` is not installed, using Python's `unicodedata` package which yields worse results.\n2024-05-13 08:45:09,024|bob|INFO|secretflow|entry.py:start_ray:59| ray_conf: RayConfig(ray_node_ip_address='job-compare17-0-global.bob.svc', ray_node_manager_port=0, ray_object_manager_port=0, ray_client_server_port=0, ray_worker_ports=[], ray_gcs_port=23729)\n2024-05-13 08:45:09,024|bob|INFO|secretflow|entry.py:start_ray:63| Trying to start ray head node at job-compare17-0-global.bob.svc, start command: RAY_BACKEND_LOG_LEVEL=debug RAY_grpc_enable_http_proxy=true OMP_NUM_THREADS=12 ray start --head --include-dashboard=false --disable-usage-stats --num-cpus=32 --node-ip-address=job-compare17-0-global.bob.svc --port=23729\n2024-05-13 08:46:23,109|bob|INFO|secretflow|entry.py:start_ray:80| 2024-05-13 08:45:13,738\tINFO usage_lib.py:423 -- Usage stats collection is disabled.\n2024-05-13 08:45:13,738\tINFO scripts.py:744 -- Local node IP: job-compare17-0-global.bob.svc\n2024-05-13 08:46:22,065\tSUCC scripts.py:781 -- --------------------\n2024-05-13 08:46:22,065\tSUCC scripts.py:782 -- Ray runtime started.\n2024-05-13 08:46:22,065\tSUCC scripts.py:783 -- --------------------\n2024-05-13 08:46:22,066\tINFO scripts.py:785 -- Next steps\n2024-05-13 08:46:22,066\tINFO scripts.py:788 -- To add another node to this Ray cluster, run\n2024-05-13 08:46:22,066\tINFO scripts.py:791 -- ray start --address='job-compare17-0-global.bob.svc:23729'\n2024-05-13 08:46:22,066\tINFO scripts.py:800 -- To connect to this Ray cluster:\n2024-05-13 08:46:22,066\tINFO scripts.py:802 -- import ray\n2024-05-13 08:46:22,066\tINFO scripts.py:803 -- ray.init(_node_ip_address='job-compare17-0-global.bob.svc')\n2024-05-13 08:46:22,066\tINFO scripts.py:834 -- To terminate the Ray runtime, run\n2024-05-13 08:46:22,066\tINFO scripts.py:835 -- ray stop\n2024-05-13 08:46:22,066\tINFO scripts.py:838 -- To view the status of the cluster, use\n2024-05-13 08:46:22,066\tINFO scripts.py:839 -- ray status\n\n2024-05-13 08:46:23,110|bob|INFO|secretflow|entry.py:start_ray:81| Succeeded to start ray head node at job-compare17-0-global.bob.svc.\n2024-05-13 08:46:23,111|bob|INFO|secretflow|entry.py:main:510| datasource.access_directly True\nsf_node_eval_param {\n \"domain\": \"user\",\n \"name\": \"ss_compare\",\n \"version\": \"0.0.1\",\n \"attrPaths\": [\n \"input_table/alice_value/key\",\n \"input_table/bob_value/key\",\n \"tolerance\"\n ],\n \"attrs\": [\n {\n \"ss\": [\n \"deposit_alice\"\n ]\n },\n {\n \"ss\": [\n \"deposit_bob\"\n ]\n },\n {\n \"i64\": \"10\"\n }\n ]\n} \nTraceback (most recent call last):\n File \"/usr/local/lib/python3.10/runpy.py\", line 196, in _run_module_as_main\n return _run_code(code, main_globals, None,\n File \"/usr/local/lib/python3.10/runpy.py\", line 86, in _run_code\n exec(code, run_globals)\n File \"/usr/local/lib/python3.10/site-packages/secretflow/kuscia/entry.py\", line 547, in \u003cmodule\u003e\n main()\n File \"/usr/local/lib/python3.10/site-packages/click/core.py\", line 1157, in __call__\n return self.main(*args, **kwargs)\n File \"/usr/local/lib/python3.10/site-packages/click/core.py\", line 1078, in main\n rv = self.invoke(ctx)\n File \"/usr/local/lib/python3.10/site-packages/click/core.py\", line 1434, in invoke\n return ctx.invoke(self.callback, **ctx.params)\n File \"/usr/local/lib/python3.10/site-packages/click/core.py\", line 783, in invoke\n return __callback(*args, **kwargs)\n File \"/usr/local/lib/python3.10/site-packages/secretflow/kuscia/entry.py\", line 514, in main\n sf_node_eval_param = preprocess_sf_node_eval_param(\n File \"/usr/local/lib/python3.10/site-packages/secretflow/kuscia/entry.py\", line 273, in preprocess_sf_node_eval_param\n assert len(comp_def.inputs) == len(\nAssertionError: cnt of sf_input_ids doesn't match cnt of comp_def.inputs.\n","reason":"Error","startedAt":"2024-05-13T08:43:31Z"}}}],"phase":"Failed","podIP":null,"podIPs":null}}
你好,最新的kuscia稳定版本是0.7.0b0,请使用稳定版~ 0.8正式版将会在近日发出
如在使用上有疑问,欢迎进行反馈~
Stale issue message. Please comment to remove stale tag. Otherwise this issue will be closed soon.