FATE
FATE copied to clipboard
ValueError(f"retcode err, callback result is {result}")
Describe the bug
2023-04-12 01:56:33.810 | ERROR | main:
File "/usr/local/lib/python3.8/site-packages/pipeline/utils/invoker/job_submitter.py", line 46, in submit_job raise ValueError(f"retcode err, callback result is {result}")
ValueError: retcode err, callback result is {'jobId': '202304120156159254720', 'retcode': 103, 'retmsg': 'Traceback (most recent call last):\n File "/data/projects/fate/fateflow/python/fate_flow/scheduler/dag_scheduler.py", line 142, in submit\n raise Exception("create job failed", response)\nException: ('create job failed', {'guest': {9999: {'data': {'components': {'dataio_0': {'need_run': True}, 'dataio_1': {'need_run': True}, 'evaluation_0': {'need_run': True}, 'homo_lr_0': {'need_run': True}, 'reader_0': {'need_run': True}, 'reader_1': {'need_run': True}}}, 'retcode': 0, 'retmsg': 'success'}}, 'host': {10000: {'retcode': <RetCode.FEDERATED_ERROR: 104>, 'retmsg': 'Federated schedule error, <_InactiveRpcError of RPC that terminated with:\n\tstatus = StatusCode.UNKNOWN\n\tdetails = "\n[Roll Site Error TransInfo] \n location msg=operation POST not supported \n stack info=scala.NotImplementedError: operation POST not supported\n\tat com.webank.eggroll.rollsite.EggSiteServicer.processCommand(EggSiteServicer.scala:173)\n\tat com.webank.eggroll.rollsite.EggSiteServicer.unaryCall(EggSiteServicer.scala:95)\n\tat com.webank.ai.eggroll.api.networking.proxy.DataTransferServiceGrpc$MethodHandlers.invoke(DataTransferServiceGrpc.java:406)\n\tat io.grpc.stub.ServerCalls$UnaryServerCallHandler$UnaryServerCallListener.onHalfClose(ServerCalls.java:180)\n\tat io.grpc.PartialForwardingServerCallListener.onHalfClose(PartialForwardingServerCallListener.java:35)\n\tat io.grpc.ForwardingServerCallListener.onHalfClose(ForwardingServerCallListener.java:23)\n\tat io.grpc.ForwardingServerCallListener$SimpleForwardingServerCallListener.onHalfClose(ForwardingServerCallListener.java:40)\n\tat io.grpc.Contexts$ContextualizedServerCallListener.onHalfClose(Contexts.java:86)\n\tat io.grpc.internal.ServerCallImpl$ServerStreamListenerImpl.halfClosed(ServerCallImpl.java:331)\n\tat io.grpc.internal.ServerImpl$JumpToApplicationThreadServerStreamListener$1HalfClosed.runInContext(ServerImpl.java:814)\n\tat io.grpc.internal.ContextRunnable.run(ContextRunnable.java:37)\n\tat io.grpc.internal.SerializingExecutor.run(SerializingExecutor.java:123)\n\tat java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)\n\tat java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)\n\tat java.lang.Thread.run(Thread.java:750)\n \n\nexception trans path: rollsite(9999)"\n\tdebug_error_string = "{"created":"@1681264583.445007587","description":"Error received from peer ipv4:10.99.68.60:9370","file":"src/core/lib/surface/call.cc","file_line":952,"grpc_message":"\\n[Roll Site Error TransInfo] \\n location msg=operation POST not supported \\n stack info=scala.NotImplementedError: operation POST not supported\\n\\tat com.webank.eggroll.rollsite.EggSiteServicer.processCommand(EggSiteServicer.scala:173)\\n\\tat com.webank.eggroll.rollsite.EggSiteServicer.unaryCall(EggSiteServicer.scala:95)\\n\\tat com.webank.ai.eggroll.api.networking.proxy.DataTransferServiceGrpc$MethodHandlers.invoke(DataTransferServiceGrpc.java:406)\\n\\tat io.grpc.stub.ServerCalls$UnaryServerCallHandler$UnaryServerCallListener.onHalfClose(ServerCalls.java:180)\\n\\tat io.grpc.PartialForwardingServerCallListener.onHalfClose(PartialForwardingServerCallListener.java:35)\\n\\tat io.grpc.ForwardingServerCallListener.onHalfClose(ForwardingServerCallListener.java:23)\\n\\tat io.grpc.ForwardingServerCallListener$SimpleForwardingServerCallListener.onHalfClose(ForwardingServerCallListener.java:40)\\n\\tat io.grpc.Contexts$ContextualizedServerCallListener.onHalfClose(Contexts.java:86)\\n\\tat io.grpc.internal.ServerCallImpl$ServerStreamListenerImpl.halfClosed(ServerCallImpl.java:331)\\n\\tat io.grpc.internal.ServerImpl$JumpToApplicationThreadServerStreamListener$1HalfClosed.runInContext(ServerImpl.java:814)\\n\\tat io.grpc.internal.ContextRunnable.run(ContextRunnable.java:37)\\n\\tat io.grpc.internal.SerializingExecutor.run(SerializingExecutor.java:123)\\n\\tat java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)\\n\\tat java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)\\n\\tat java.lang.Thread.run(Thread.java:750)\\n \\n\\nexception trans path: rollsite(9999)","grpc_status":2}"\n>'}}, 'arbiter': {10000: {'retcode': <RetCode.FEDERATED_ERROR: 104>, 'retmsg': 'Federated schedule error, <_InactiveRpcError of RPC that terminated with:\n\tstatus = StatusCode.UNKNOWN\n\tdetails = "\n[Roll Site Error TransInfo] \n location msg=operation POST not supported \n stack info=scala.NotImplementedError: operation POST not supported\n\tat com.webank.eggroll.rollsite.EggSiteServicer.processCommand(EggSiteServicer.scala:173)\n\tat com.webank.eggroll.rollsite.EggSiteServicer.unaryCall(EggSiteServicer.scala:95)\n\tat com.webank.ai.eggroll.api.networking.proxy.DataTransferServiceGrpc$MethodHandlers.invoke(DataTransferServiceGrpc.java:406)\n\tat io.grpc.stub.ServerCalls$UnaryServerCallHandler$UnaryServerCallListener.onHalfClose(ServerCalls.java:180)\n\tat io.grpc.PartialForwardingServerCallListener.onHalfClose(PartialForwardingServerCallListener.java:35)\n\tat io.grpc.ForwardingServerCallListener.onHalfClose(ForwardingServerCallListener.java:23)\n\tat io.grpc.ForwardingServerCallListener$SimpleForwardingServerCallListener.onHalfClose(ForwardingServerCallListener.java:40)\n\tat io.grpc.Contexts$ContextualizedServerCallListener.onHalfClose(Contexts.java:86)\n\tat io.grpc.internal.ServerCallImpl$ServerStreamListenerImpl.halfClosed(ServerCallImpl.java:331)\n\tat io.grpc.internal.ServerImpl$JumpToApplicationThreadServerStreamListener$1HalfClosed.runInContext(ServerImpl.java:814)\n\tat io.grpc.internal.ContextRunnable.run(ContextRunnable.java:37)\n\tat io.grpc.internal.SerializingExecutor.run(SerializingExecutor.java:123)\n\tat java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)\n\tat java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)\n\tat java.lang.Thread.run(Thread.java:750)\n \n\nexception trans path: rollsite(9999)"\n\tdebug_error_string = "{"created":"@1681264589.490125154","description":"Error received from peer ipv4:10.99.68.60:9370","file":"src/core/lib/surface/call.cc","file_line":952,"grpc_message":"\\n[Roll Site Error TransInfo] \\n location msg=operation POST not supported \\n stack info=scala.NotImplementedError: operation POST not supported\\n\\tat com.webank.eggroll.rollsite.EggSiteServicer.processCommand(EggSiteServicer.scala:173)\\n\\tat com.webank.eggroll.rollsite.EggSiteServicer.unaryCall(EggSiteServicer.scala:95)\\n\\tat com.webank.ai.eggroll.api.networking.proxy.DataTransferServiceGrpc$MethodHandlers.invoke(DataTransferServiceGrpc.java:406)\\n\\tat io.grpc.stub.ServerCalls$UnaryServerCallHandler$UnaryServerCallListener.onHalfClose(ServerCalls.java:180)\\n\\tat io.grpc.PartialForwardingServerCallListener.onHalfClose(PartialForwardingServerCallListener.java:35)\\n\\tat io.grpc.ForwardingServerCallListener.onHalfClose(ForwardingServerCallListener.java:23)\\n\\tat io.grpc.ForwardingServerCallListener$SimpleForwardingServerCallListener.onHalfClose(ForwardingServerCallListener.java:40)\\n\\tat io.grpc.Contexts$ContextualizedServerCallListener.onHalfClose(Contexts.java:86)\\n\\tat io.grpc.internal.ServerCallImpl$ServerStreamListenerImpl.halfClosed(ServerCallImpl.java:331)\\n\\tat io.grpc.internal.ServerImpl$JumpToApplicationThreadServerStreamListener$1HalfClosed.runInContext(ServerImpl.java:814)\\n\\tat io.grpc.internal.ContextRunnable.run(ContextRunnable.java:37)\\n\\tat io.grpc.internal.SerializingExecutor.run(SerializingExecutor.java:123)\\n\\tat java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)\\n\\tat java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)\\n\\tat java.lang.Thread.run(Thread.java:750)\\n \\n\\nexception trans path: rollsite(9999)","grpc_status":2}"\n>'}}})\n'}
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/local/lib/python3.8/runpy.py", line 194, in _run_module_as_main
return _run_code(code, main_globals, None,
│ │ └ {'name': 'main', 'doc': 'Entry point for launching an IPython kernel.\n\nThis is separate from the ipykernel pack...
│ └ <code object
File "/tmp/ipykernel_16/2110458342.py", line 30, in
pipeline.fit(job_parameters) │ │ └ <pipeline.runtime.entity.JobParameters object at 0x7f48e0e80490> │ └ <function PipeLine.fit at 0x7f4920be10d0> └ <pipeline.backend.pipeline.PipeLine object at 0x7f48e0e83340>
File "/usr/local/lib/python3.8/site-packages/pipeline/backend/pipeline.py", line 568, in fit self._train_job_id, detail_info = self._job_invoker.submit_job(self._train_dsl, training_conf, callback_func) │ │ │ │ │ │ │ │ └ None │ │ │ │ │ │ │ └ {'dsl_version': 2, 'initiator': {'role': 'guest', 'party_id': 9999}, 'role': {'guest': [9999], 'host': [10000], 'arbiter': [1... │ │ │ │ │ │ └ {'components': {'reader_0': {'module': 'Reader', 'output': {'data': ['data']}, 'provider': 'fate_flow'}, 'reader_1': {'module... │ │ │ │ │ └ <pipeline.backend.pipeline.PipeLine object at 0x7f48e0e83340> │ │ │ │ └ <function JobInvoker.submit_job at 0x7f4920bda310> │ │ │ └ <pipeline.utils.invoker.job_submitter.JobInvoker object at 0x7f48e0e83370> │ │ └ <pipeline.backend.pipeline.PipeLine object at 0x7f48e0e83340> │ └ None └ <pipeline.backend.pipeline.PipeLine object at 0x7f48e0e83340> File "/usr/local/lib/python3.8/site-packages/pipeline/utils/invoker/job_submitter.py", line 54, in submit_job raise ValueError("job submit failed, err msg: {}".format(result)) └ {'jobId': '202304120156159254720', 'retcode': 103, 'retmsg': 'Traceback (most recent call last):\n File "/data/projects/fate...
ValueError: job submit failed, err msg: {'jobId': '202304120156159254720', 'retcode': 103, 'retmsg': 'Traceback (most recent call last):\n File "/data/projects/fate/fateflow/python/fate_flow/scheduler/dag_scheduler.py", line 142, in submit\n raise Exception("create job failed", response)\nException: ('create job failed', {'guest': {9999: {'data': {'components': {'dataio_0': {'need_run': True}, 'dataio_1': {'need_run': True}, 'evaluation_0': {'need_run': True}, 'homo_lr_0': {'need_run': True}, 'reader_0': {'need_run': True}, 'reader_1': {'need_run': True}}}, 'retcode': 0, 'retmsg': 'success'}}, 'host': {10000: {'retcode': <RetCode.FEDERATED_ERROR: 104>, 'retmsg': 'Federated schedule error, <_InactiveRpcError of RPC that terminated with:\n\tstatus = StatusCode.UNKNOWN\n\tdetails = "\n[Roll Site Error TransInfo] \n location msg=operation POST not supported \n stack info=scala.NotImplementedError: operation POST not supported\n\tat com.webank.eggroll.rollsite.EggSiteServicer.processCommand(EggSiteServicer.scala:173)\n\tat com.webank.eggroll.rollsite.EggSiteServicer.unaryCall(EggSiteServicer.scala:95)\n\tat com.webank.ai.eggroll.api.networking.proxy.DataTransferServiceGrpc$MethodHandlers.invoke(DataTransferServiceGrpc.java:406)\n\tat io.grpc.stub.ServerCalls$UnaryServerCallHandler$UnaryServerCallListener.onHalfClose(ServerCalls.java:180)\n\tat io.grpc.PartialForwardingServerCallListener.onHalfClose(PartialForwardingServerCallListener.java:35)\n\tat io.grpc.ForwardingServerCallListener.onHalfClose(ForwardingServerCallListener.java:23)\n\tat io.grpc.ForwardingServerCallListener$SimpleForwardingServerCallListener.onHalfClose(ForwardingServerCallListener.java:40)\n\tat io.grpc.Contexts$ContextualizedServerCallListener.onHalfClose(Contexts.java:86)\n\tat io.grpc.internal.ServerCallImpl$ServerStreamListenerImpl.halfClosed(ServerCallImpl.java:331)\n\tat io.grpc.internal.ServerImpl$JumpToApplicationThreadServerStreamListener$1HalfClosed.runInContext(ServerImpl.java:814)\n\tat io.grpc.internal.ContextRunnable.run(ContextRunnable.java:37)\n\tat io.grpc.internal.SerializingExecutor.run(SerializingExecutor.java:123)\n\tat java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)\n\tat java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)\n\tat java.lang.Thread.run(Thread.java:750)\n \n\nexception trans path: rollsite(9999)"\n\tdebug_error_string = "{"created":"@1681264583.445007587","description":"Error received from peer ipv4:10.99.68.60:9370","file":"src/core/lib/surface/call.cc","file_line":952,"grpc_message":"\\n[Roll Site Error TransInfo] \\n location msg=operation POST not supported \\n stack info=scala.NotImplementedError: operation POST not supported\\n\\tat com.webank.eggroll.rollsite.EggSiteServicer.processCommand(EggSiteServicer.scala:173)\\n\\tat com.webank.eggroll.rollsite.EggSiteServicer.unaryCall(EggSiteServicer.scala:95)\\n\\tat com.webank.ai.eggroll.api.networking.proxy.DataTransferServiceGrpc$MethodHandlers.invoke(DataTransferServiceGrpc.java:406)\\n\\tat io.grpc.stub.ServerCalls$UnaryServerCallHandler$UnaryServerCallListener.onHalfClose(ServerCalls.java:180)\\n\\tat io.grpc.PartialForwardingServerCallListener.onHalfClose(PartialForwardingServerCallListener.java:35)\\n\\tat io.grpc.ForwardingServerCallListener.onHalfClose(ForwardingServerCallListener.java:23)\\n\\tat io.grpc.ForwardingServerCallListener$SimpleForwardingServerCallListener.onHalfClose(ForwardingServerCallListener.java:40)\\n\\tat io.grpc.Contexts$ContextualizedServerCallListener.onHalfClose(Contexts.java:86)\\n\\tat io.grpc.internal.ServerCallImpl$ServerStreamListenerImpl.halfClosed(ServerCallImpl.java:331)\\n\\tat io.grpc.internal.ServerImpl$JumpToApplicationThreadServerStreamListener$1HalfClosed.runInContext(ServerImpl.java:814)\\n\\tat io.grpc.internal.ContextRunnable.run(ContextRunnable.java:37)\\n\\tat io.grpc.internal.SerializingExecutor.run(SerializingExecutor.java:123)\\n\\tat java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)\\n\\tat java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)\\n\\tat java.lang.Thread.run(Thread.java:750)\\n \\n\\nexception trans path: rollsite(9999)","grpc_status":2}"\n>'}}, 'arbiter': {10000: {'retcode': <RetCode.FEDERATED_ERROR: 104>, 'retmsg': 'Federated schedule error, <_InactiveRpcError of RPC that terminated with:\n\tstatus = StatusCode.UNKNOWN\n\tdetails = "\n[Roll Site Error TransInfo] \n location msg=operation POST not supported \n stack info=scala.NotImplementedError: operation POST not supported\n\tat com.webank.eggroll.rollsite.EggSiteServicer.processCommand(EggSiteServicer.scala:173)\n\tat com.webank.eggroll.rollsite.EggSiteServicer.unaryCall(EggSiteServicer.scala:95)\n\tat com.webank.ai.eggroll.api.networking.proxy.DataTransferServiceGrpc$MethodHandlers.invoke(DataTransferServiceGrpc.java:406)\n\tat io.grpc.stub.ServerCalls$UnaryServerCallHandler$UnaryServerCallListener.onHalfClose(ServerCalls.java:180)\n\tat io.grpc.PartialForwardingServerCallListener.onHalfClose(PartialForwardingServerCallListener.java:35)\n\tat io.grpc.ForwardingServerCallListener.onHalfClose(ForwardingServerCallListener.java:23)\n\tat io.grpc.ForwardingServerCallListener$SimpleForwardingServerCallListener.onHalfClose(ForwardingServerCallListener.java:40)\n\tat io.grpc.Contexts$ContextualizedServerCallListener.onHalfClose(Contexts.java:86)\n\tat io.grpc.internal.ServerCallImpl$ServerStreamListenerImpl.halfClosed(ServerCallImpl.java:331)\n\tat io.grpc.internal.ServerImpl$JumpToApplicationThreadServerStreamListener$1HalfClosed.runInContext(ServerImpl.java:814)\n\tat io.grpc.internal.ContextRunnable.run(ContextRunnable.java:37)\n\tat io.grpc.internal.SerializingExecutor.run(SerializingExecutor.java:123)\n\tat java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)\n\tat java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)\n\tat java.lang.Thread.run(Thread.java:750)\n \n\nexception trans path: rollsite(9999)"\n\tdebug_error_string = "{"created":"@1681264589.490125154","description":"Error received from peer ipv4:10.99.68.60:9370","file":"src/core/lib/surface/call.cc","file_line":952,"grpc_message":"\\n[Roll Site Error TransInfo] \\n location msg=operation POST not supported \\n stack info=scala.NotImplementedError: operation POST not supported\\n\\tat com.webank.eggroll.rollsite.EggSiteServicer.processCommand(EggSiteServicer.scala:173)\\n\\tat com.webank.eggroll.rollsite.EggSiteServicer.unaryCall(EggSiteServicer.scala:95)\\n\\tat com.webank.ai.eggroll.api.networking.proxy.DataTransferServiceGrpc$MethodHandlers.invoke(DataTransferServiceGrpc.java:406)\\n\\tat io.grpc.stub.ServerCalls$UnaryServerCallHandler$UnaryServerCallListener.onHalfClose(ServerCalls.java:180)\\n\\tat io.grpc.PartialForwardingServerCallListener.onHalfClose(PartialForwardingServerCallListener.java:35)\\n\\tat io.grpc.ForwardingServerCallListener.onHalfClose(ForwardingServerCallListener.java:23)\\n\\tat io.grpc.ForwardingServerCallListener$SimpleForwardingServerCallListener.onHalfClose(ForwardingServerCallListener.java:40)\\n\\tat io.grpc.Contexts$ContextualizedServerCallListener.onHalfClose(Contexts.java:86)\\n\\tat io.grpc.internal.ServerCallImpl$ServerStreamListenerImpl.halfClosed(ServerCallImpl.java:331)\\n\\tat io.grpc.internal.ServerImpl$JumpToApplicationThreadServerStreamListener$1HalfClosed.runInContext(ServerImpl.java:814)\\n\\tat io.grpc.internal.ContextRunnable.run(ContextRunnable.java:37)\\n\\tat io.grpc.internal.SerializingExecutor.run(SerializingExecutor.java:123)\\n\\tat java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)\\n\\tat java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)\\n\\tat java.lang.Thread.run(Thread.java:750)\\n \\n\\nexception trans path: rollsite(9999)","grpc_status":2}"\n>'}}})\n'} To Reproduce Steps to reproduce the behavior:
- Go to 'https://github.com/vsphere-tmm/KubeFATE-on-VCF/blob/main/credit_card_use_case/homo_lr_guest.ipynb'
- Run this cell 'Define the Components of Training Pipeline'
initialize pipeline
pipeline = PipeLine()
set job initiator
pipeline.set_initiator(role="guest", party_id=guest)
set participants information
pipeline.set_roles(guest=guest, host=host, arbiter=arbiter)
add components to pipeline, in order of task execution
pipeline.add_component(reader_0) pipeline.add_component(reader_1)
pipeline.add_component(dataio_0, data=Data(data=reader_0.output.data)) pipeline.add_component(dataio_1, data=Data(data=reader_1.output.data), model=Model(dataio_0.output.model))
homo_lr_0 = HomoLR(name='homo_lr_0', **param) pipeline.add_component(homo_lr_0, data=Data(train_data=dataio_0.output.data, validate_data=dataio_1.output.data))
evaluation_0 = Evaluation(name="evaluation_0", eval_type="binary") evaluation_0.get_party_instance(role='host', party_id=host).component_param(need_run=False) pipeline.add_component(evaluation_0, data=Data(data=homo_lr_0.output.data))
compile pipeline once finished adding modules, this step will form conf and dsl files for running job
pipeline.compile()
fit model
job_parameters = JobParameters(backend=backend, work_mode=work_mode) pipeline.fit(job_parameters)
query component summary
import json print(json.dumps(pipeline.get_component("homo_lr_0").get_summary(), indent=4))' 5. See error you will see the above error
Expected behavior How to submit the pipeline for training? how to rectify this error?
Desktop (please complete the following information):
- OS: Windows
- Browser chrome
- usinge kubefate v1.9.0
- ingress controller v1.1.3
- kubernetes v1.23.5
跑Homo-NN-Quick-Start代码遇到了一模一样的问题,有懂的老哥吗
Kubefate 1.8.0遇到一样问题,rollsite之间通讯报错operation POST not supported