FATE icon indicating copy to clipboard operation
FATE copied to clipboard

ValueError(f"retcode err, callback result is {result}")

Open Mansi2487 opened this issue 1 year ago • 2 comments

Describe the bug

2023-04-12 01:56:33.810 | ERROR | main::30 - An error has been caught in function '', process 'MainProcess' (16), thread 'MainThread' (139953891792704): Traceback (most recent call last):

File "/usr/local/lib/python3.8/site-packages/pipeline/utils/invoker/job_submitter.py", line 46, in submit_job raise ValueError(f"retcode err, callback result is {result}")

ValueError: retcode err, callback result is {'jobId': '202304120156159254720', 'retcode': 103, 'retmsg': 'Traceback (most recent call last):\n File "/data/projects/fate/fateflow/python/fate_flow/scheduler/dag_scheduler.py", line 142, in submit\n raise Exception("create job failed", response)\nException: ('create job failed', {'guest': {9999: {'data': {'components': {'dataio_0': {'need_run': True}, 'dataio_1': {'need_run': True}, 'evaluation_0': {'need_run': True}, 'homo_lr_0': {'need_run': True}, 'reader_0': {'need_run': True}, 'reader_1': {'need_run': True}}}, 'retcode': 0, 'retmsg': 'success'}}, 'host': {10000: {'retcode': <RetCode.FEDERATED_ERROR: 104>, 'retmsg': 'Federated schedule error, <_InactiveRpcError of RPC that terminated with:\n\tstatus = StatusCode.UNKNOWN\n\tdetails = "\n[Roll Site Error TransInfo] \n location msg=operation POST not supported \n stack info=scala.NotImplementedError: operation POST not supported\n\tat com.webank.eggroll.rollsite.EggSiteServicer.processCommand(EggSiteServicer.scala:173)\n\tat com.webank.eggroll.rollsite.EggSiteServicer.unaryCall(EggSiteServicer.scala:95)\n\tat com.webank.ai.eggroll.api.networking.proxy.DataTransferServiceGrpc$MethodHandlers.invoke(DataTransferServiceGrpc.java:406)\n\tat io.grpc.stub.ServerCalls$UnaryServerCallHandler$UnaryServerCallListener.onHalfClose(ServerCalls.java:180)\n\tat io.grpc.PartialForwardingServerCallListener.onHalfClose(PartialForwardingServerCallListener.java:35)\n\tat io.grpc.ForwardingServerCallListener.onHalfClose(ForwardingServerCallListener.java:23)\n\tat io.grpc.ForwardingServerCallListener$SimpleForwardingServerCallListener.onHalfClose(ForwardingServerCallListener.java:40)\n\tat io.grpc.Contexts$ContextualizedServerCallListener.onHalfClose(Contexts.java:86)\n\tat io.grpc.internal.ServerCallImpl$ServerStreamListenerImpl.halfClosed(ServerCallImpl.java:331)\n\tat io.grpc.internal.ServerImpl$JumpToApplicationThreadServerStreamListener$1HalfClosed.runInContext(ServerImpl.java:814)\n\tat io.grpc.internal.ContextRunnable.run(ContextRunnable.java:37)\n\tat io.grpc.internal.SerializingExecutor.run(SerializingExecutor.java:123)\n\tat java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)\n\tat java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)\n\tat java.lang.Thread.run(Thread.java:750)\n \n\nexception trans path: rollsite(9999)"\n\tdebug_error_string = "{"created":"@1681264583.445007587","description":"Error received from peer ipv4:10.99.68.60:9370","file":"src/core/lib/surface/call.cc","file_line":952,"grpc_message":"\\n[Roll Site Error TransInfo] \\n location msg=operation POST not supported \\n stack info=scala.NotImplementedError: operation POST not supported\\n\\tat com.webank.eggroll.rollsite.EggSiteServicer.processCommand(EggSiteServicer.scala:173)\\n\\tat com.webank.eggroll.rollsite.EggSiteServicer.unaryCall(EggSiteServicer.scala:95)\\n\\tat com.webank.ai.eggroll.api.networking.proxy.DataTransferServiceGrpc$MethodHandlers.invoke(DataTransferServiceGrpc.java:406)\\n\\tat io.grpc.stub.ServerCalls$UnaryServerCallHandler$UnaryServerCallListener.onHalfClose(ServerCalls.java:180)\\n\\tat io.grpc.PartialForwardingServerCallListener.onHalfClose(PartialForwardingServerCallListener.java:35)\\n\\tat io.grpc.ForwardingServerCallListener.onHalfClose(ForwardingServerCallListener.java:23)\\n\\tat io.grpc.ForwardingServerCallListener$SimpleForwardingServerCallListener.onHalfClose(ForwardingServerCallListener.java:40)\\n\\tat io.grpc.Contexts$ContextualizedServerCallListener.onHalfClose(Contexts.java:86)\\n\\tat io.grpc.internal.ServerCallImpl$ServerStreamListenerImpl.halfClosed(ServerCallImpl.java:331)\\n\\tat io.grpc.internal.ServerImpl$JumpToApplicationThreadServerStreamListener$1HalfClosed.runInContext(ServerImpl.java:814)\\n\\tat io.grpc.internal.ContextRunnable.run(ContextRunnable.java:37)\\n\\tat io.grpc.internal.SerializingExecutor.run(SerializingExecutor.java:123)\\n\\tat java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)\\n\\tat java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)\\n\\tat java.lang.Thread.run(Thread.java:750)\\n \\n\\nexception trans path: rollsite(9999)","grpc_status":2}"\n>'}}, 'arbiter': {10000: {'retcode': <RetCode.FEDERATED_ERROR: 104>, 'retmsg': 'Federated schedule error, <_InactiveRpcError of RPC that terminated with:\n\tstatus = StatusCode.UNKNOWN\n\tdetails = "\n[Roll Site Error TransInfo] \n location msg=operation POST not supported \n stack info=scala.NotImplementedError: operation POST not supported\n\tat com.webank.eggroll.rollsite.EggSiteServicer.processCommand(EggSiteServicer.scala:173)\n\tat com.webank.eggroll.rollsite.EggSiteServicer.unaryCall(EggSiteServicer.scala:95)\n\tat com.webank.ai.eggroll.api.networking.proxy.DataTransferServiceGrpc$MethodHandlers.invoke(DataTransferServiceGrpc.java:406)\n\tat io.grpc.stub.ServerCalls$UnaryServerCallHandler$UnaryServerCallListener.onHalfClose(ServerCalls.java:180)\n\tat io.grpc.PartialForwardingServerCallListener.onHalfClose(PartialForwardingServerCallListener.java:35)\n\tat io.grpc.ForwardingServerCallListener.onHalfClose(ForwardingServerCallListener.java:23)\n\tat io.grpc.ForwardingServerCallListener$SimpleForwardingServerCallListener.onHalfClose(ForwardingServerCallListener.java:40)\n\tat io.grpc.Contexts$ContextualizedServerCallListener.onHalfClose(Contexts.java:86)\n\tat io.grpc.internal.ServerCallImpl$ServerStreamListenerImpl.halfClosed(ServerCallImpl.java:331)\n\tat io.grpc.internal.ServerImpl$JumpToApplicationThreadServerStreamListener$1HalfClosed.runInContext(ServerImpl.java:814)\n\tat io.grpc.internal.ContextRunnable.run(ContextRunnable.java:37)\n\tat io.grpc.internal.SerializingExecutor.run(SerializingExecutor.java:123)\n\tat java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)\n\tat java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)\n\tat java.lang.Thread.run(Thread.java:750)\n \n\nexception trans path: rollsite(9999)"\n\tdebug_error_string = "{"created":"@1681264589.490125154","description":"Error received from peer ipv4:10.99.68.60:9370","file":"src/core/lib/surface/call.cc","file_line":952,"grpc_message":"\\n[Roll Site Error TransInfo] \\n location msg=operation POST not supported \\n stack info=scala.NotImplementedError: operation POST not supported\\n\\tat com.webank.eggroll.rollsite.EggSiteServicer.processCommand(EggSiteServicer.scala:173)\\n\\tat com.webank.eggroll.rollsite.EggSiteServicer.unaryCall(EggSiteServicer.scala:95)\\n\\tat com.webank.ai.eggroll.api.networking.proxy.DataTransferServiceGrpc$MethodHandlers.invoke(DataTransferServiceGrpc.java:406)\\n\\tat io.grpc.stub.ServerCalls$UnaryServerCallHandler$UnaryServerCallListener.onHalfClose(ServerCalls.java:180)\\n\\tat io.grpc.PartialForwardingServerCallListener.onHalfClose(PartialForwardingServerCallListener.java:35)\\n\\tat io.grpc.ForwardingServerCallListener.onHalfClose(ForwardingServerCallListener.java:23)\\n\\tat io.grpc.ForwardingServerCallListener$SimpleForwardingServerCallListener.onHalfClose(ForwardingServerCallListener.java:40)\\n\\tat io.grpc.Contexts$ContextualizedServerCallListener.onHalfClose(Contexts.java:86)\\n\\tat io.grpc.internal.ServerCallImpl$ServerStreamListenerImpl.halfClosed(ServerCallImpl.java:331)\\n\\tat io.grpc.internal.ServerImpl$JumpToApplicationThreadServerStreamListener$1HalfClosed.runInContext(ServerImpl.java:814)\\n\\tat io.grpc.internal.ContextRunnable.run(ContextRunnable.java:37)\\n\\tat io.grpc.internal.SerializingExecutor.run(SerializingExecutor.java:123)\\n\\tat java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)\\n\\tat java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)\\n\\tat java.lang.Thread.run(Thread.java:750)\\n \\n\\nexception trans path: rollsite(9999)","grpc_status":2}"\n>'}}})\n'}

During handling of the above exception, another exception occurred:

Traceback (most recent call last):

File "/usr/local/lib/python3.8/runpy.py", line 194, in _run_module_as_main return _run_code(code, main_globals, None, │ │ └ {'name': 'main', 'doc': 'Entry point for launching an IPython kernel.\n\nThis is separate from the ipykernel pack... │ └ <code object at 0x7f498dde0be0, file "/usr/local/lib/python3.8/site-packages/ipykernel_launcher.py", line 1> └ <function _run_code at 0x7f498ddaddc0> File "/usr/local/lib/python3.8/runpy.py", line 87, in _run_code exec(code, run_globals) │ └ {'name': 'main', 'doc': 'Entry point for launching an IPython kernel.\n\nThis is separate from the ipykernel pack... └ <code object at 0x7f498dde0be0, file "/usr/local/lib/python3.8/site-packages/ipykernel_launcher.py", line 1> File "/usr/local/lib/python3.8/site-packages/ipykernel_launcher.py", line 17, in app.launch_new_instance() │ └ <bound method Application.launch_instance of <class 'ipykernel.kernelapp.IPKernelApp'>> └ <module 'ipykernel.kernelapp' from '/usr/local/lib/python3.8/site-packages/ipykernel/kernelapp.py'> File "/usr/local/lib/python3.8/site-packages/traitlets/config/application.py", line 978, in launch_instance app.start() │ └ <function IPKernelApp.start at 0x7f4989826040> └ <ipykernel.kernelapp.IPKernelApp object at 0x7f498ded2be0> File "/usr/local/lib/python3.8/site-packages/ipykernel/kernelapp.py", line 712, in start self.io_loop.start() │ │ └ <function BaseAsyncIOLoop.start at 0x7f498b6c0790> │ └ <tornado.platform.asyncio.AsyncIOMainLoop object at 0x7f49897b2d60> └ <ipykernel.kernelapp.IPKernelApp object at 0x7f498ded2be0> File "/usr/local/lib/python3.8/site-packages/tornado/platform/asyncio.py", line 215, in start self.asyncio_loop.run_forever() │ │ └ <function BaseEventLoop.run_forever at 0x7f498d5e7c10> │ └ <_UnixSelectorEventLoop running=True closed=False debug=False> └ <tornado.platform.asyncio.AsyncIOMainLoop object at 0x7f49897b2d60> File "/usr/local/lib/python3.8/asyncio/base_events.py", line 570, in run_forever self._run_once() │ └ <function BaseEventLoop._run_once at 0x7f498d5e9790> └ <_UnixSelectorEventLoop running=True closed=False debug=False> File "/usr/local/lib/python3.8/asyncio/base_events.py", line 1859, in _run_once handle._run() │ └ <function Handle._run at 0x7f498d6dd550> └ <Handle <TaskWakeupMethWrapper object at 0x7f48e0f26430>(<Future finis...670>, ...],))>)> File "/usr/local/lib/python3.8/asyncio/events.py", line 81, in _run self._context.run(self._callback, *self._args) │ │ │ │ │ └ <member '_args' of 'Handle' objects> │ │ │ │ └ <Handle <TaskWakeupMethWrapper object at 0x7f48e0f26430>(<Future finis...670>, ...],))>)> │ │ │ └ <member '_callback' of 'Handle' objects> │ │ └ <Handle <TaskWakeupMethWrapper object at 0x7f48e0f26430>(<Future finis...670>, ...],))>)> │ └ <member '_context' of 'Handle' objects> └ <Handle <TaskWakeupMethWrapper object at 0x7f48e0f26430>(<Future finis...670>, ...],))>)> File "/usr/local/lib/python3.8/site-packages/ipykernel/kernelbase.py", line 510, in dispatch_queue await self.process_one() │ └ <function Kernel.process_one at 0x7f498a390430> └ <ipykernel.ipkernel.IPythonKernel object at 0x7f4988754790> File "/usr/local/lib/python3.8/site-packages/ipykernel/kernelbase.py", line 499, in process_one await dispatch(*args) │ └ ([<zmq.sugar.frame.Frame object at 0x7f48e15c9670>, <zmq.sugar.frame.Frame object at 0x7f48e15c9bf0>, <zmq.sugar.frame.Frame ... └ <bound method Kernel.dispatch_shell of <ipykernel.ipkernel.IPythonKernel object at 0x7f4988754790>> File "/usr/local/lib/python3.8/site-packages/ipykernel/kernelbase.py", line 406, in dispatch_shell await result └ <coroutine object Kernel.execute_request at 0x7f48e21fbe40> File "/usr/local/lib/python3.8/site-packages/ipykernel/kernelbase.py", line 730, in execute_request reply_content = await reply_content └ <coroutine object IPythonKernel.do_execute at 0x7f48e0f84240> File "/usr/local/lib/python3.8/site-packages/ipykernel/ipkernel.py", line 383, in do_execute res = shell.run_cell( │ └ <function ZMQInteractiveShell.run_cell at 0x7f49898078b0> └ <ipykernel.zmqshell.ZMQInteractiveShell object at 0x7f4988754d00> File "/usr/local/lib/python3.8/site-packages/ipykernel/zmqshell.py", line 528, in run_cell return super().run_cell(*args, **kwargs) │ └ {'store_history': True, 'silent': False, 'cell_id': None} └ ('# initialize pipeline\npipeline = PipeLine()\n\n# set job initiator\npipeline.set_initiator(role="guest", party_id=guest)\n... File "/usr/local/lib/python3.8/site-packages/IPython/core/interactiveshell.py", line 2885, in run_cell result = self._run_cell( │ └ <function InteractiveShell._run_cell at 0x7f498af54a60> └ <ipykernel.zmqshell.ZMQInteractiveShell object at 0x7f4988754d00> File "/usr/local/lib/python3.8/site-packages/IPython/core/interactiveshell.py", line 2940, in _run_cell return runner(coro) │ └ <coroutine object InteractiveShell.run_cell_async at 0x7f48e0fb4f40> └ <function _pseudo_sync_runner at 0x7f498af434c0> File "/usr/local/lib/python3.8/site-packages/IPython/core/async_helpers.py", line 129, in _pseudo_sync_runner coro.send(None) │ └ <method 'send' of 'coroutine' objects> └ <coroutine object InteractiveShell.run_cell_async at 0x7f48e0fb4f40> File "/usr/local/lib/python3.8/site-packages/IPython/core/interactiveshell.py", line 3139, in run_cell_async has_raised = await self.run_ast_nodes(code_ast.body, cell_name, │ │ │ │ └ '/tmp/ipykernel_16/2110458342.py' │ │ │ └ [<_ast.Assign object at 0x7f48e0f82fa0>, <_ast.Expr object at 0x7f48e0f829a0>, <_ast.Expr object at 0x7f48e104c8e0>, <_ast.Ex... │ │ └ <ast.Module object at 0x7f48e0f82a00> │ └ <function InteractiveShell.run_ast_nodes at 0x7f498af54d30> └ <ipykernel.zmqshell.ZMQInteractiveShell object at 0x7f4988754d00> File "/usr/local/lib/python3.8/site-packages/IPython/core/interactiveshell.py", line 3318, in run_ast_nodes if await self.run_code(code, result, async=asy): │ │ │ │ └ False │ │ │ └ <ExecutionResult object at 7f48e0f26610, execution_count=21 error_before_exec=None error_in_exec=None info=<ExecutionInfo obj... │ │ └ <code object at 0x7f48e0e969d0, file "/tmp/ipykernel_16/2110458342.py", line 30> │ └ <function InteractiveShell.run_code at 0x7f498af54dc0> └ <ipykernel.zmqshell.ZMQInteractiveShell object at 0x7f4988754d00> File "/usr/local/lib/python3.8/site-packages/IPython/core/interactiveshell.py", line 3378, in run_code exec(code_obj, self.user_global_ns, self.user_ns) │ │ │ │ └ {'name': 'main', 'doc': 'Automatically created module for IPython interactive environment', 'package': None, ... │ │ │ └ <ipykernel.zmqshell.ZMQInteractiveShell object at 0x7f4988754d00> │ │ └ <property object at 0x7f498af44b30> │ └ <ipykernel.zmqshell.ZMQInteractiveShell object at 0x7f4988754d00> └ <code object at 0x7f48e0e969d0, file "/tmp/ipykernel_16/2110458342.py", line 30>

File "/tmp/ipykernel_16/2110458342.py", line 30, in pipeline.fit(job_parameters) │ │ └ <pipeline.runtime.entity.JobParameters object at 0x7f48e0e80490> │ └ <function PipeLine.fit at 0x7f4920be10d0> └ <pipeline.backend.pipeline.PipeLine object at 0x7f48e0e83340>

File "/usr/local/lib/python3.8/site-packages/pipeline/backend/pipeline.py", line 568, in fit self._train_job_id, detail_info = self._job_invoker.submit_job(self._train_dsl, training_conf, callback_func) │ │ │ │ │ │ │ │ └ None │ │ │ │ │ │ │ └ {'dsl_version': 2, 'initiator': {'role': 'guest', 'party_id': 9999}, 'role': {'guest': [9999], 'host': [10000], 'arbiter': [1... │ │ │ │ │ │ └ {'components': {'reader_0': {'module': 'Reader', 'output': {'data': ['data']}, 'provider': 'fate_flow'}, 'reader_1': {'module... │ │ │ │ │ └ <pipeline.backend.pipeline.PipeLine object at 0x7f48e0e83340> │ │ │ │ └ <function JobInvoker.submit_job at 0x7f4920bda310> │ │ │ └ <pipeline.utils.invoker.job_submitter.JobInvoker object at 0x7f48e0e83370> │ │ └ <pipeline.backend.pipeline.PipeLine object at 0x7f48e0e83340> │ └ None └ <pipeline.backend.pipeline.PipeLine object at 0x7f48e0e83340> File "/usr/local/lib/python3.8/site-packages/pipeline/utils/invoker/job_submitter.py", line 54, in submit_job raise ValueError("job submit failed, err msg: {}".format(result)) └ {'jobId': '202304120156159254720', 'retcode': 103, 'retmsg': 'Traceback (most recent call last):\n File "/data/projects/fate...

ValueError: job submit failed, err msg: {'jobId': '202304120156159254720', 'retcode': 103, 'retmsg': 'Traceback (most recent call last):\n File "/data/projects/fate/fateflow/python/fate_flow/scheduler/dag_scheduler.py", line 142, in submit\n raise Exception("create job failed", response)\nException: ('create job failed', {'guest': {9999: {'data': {'components': {'dataio_0': {'need_run': True}, 'dataio_1': {'need_run': True}, 'evaluation_0': {'need_run': True}, 'homo_lr_0': {'need_run': True}, 'reader_0': {'need_run': True}, 'reader_1': {'need_run': True}}}, 'retcode': 0, 'retmsg': 'success'}}, 'host': {10000: {'retcode': <RetCode.FEDERATED_ERROR: 104>, 'retmsg': 'Federated schedule error, <_InactiveRpcError of RPC that terminated with:\n\tstatus = StatusCode.UNKNOWN\n\tdetails = "\n[Roll Site Error TransInfo] \n location msg=operation POST not supported \n stack info=scala.NotImplementedError: operation POST not supported\n\tat com.webank.eggroll.rollsite.EggSiteServicer.processCommand(EggSiteServicer.scala:173)\n\tat com.webank.eggroll.rollsite.EggSiteServicer.unaryCall(EggSiteServicer.scala:95)\n\tat com.webank.ai.eggroll.api.networking.proxy.DataTransferServiceGrpc$MethodHandlers.invoke(DataTransferServiceGrpc.java:406)\n\tat io.grpc.stub.ServerCalls$UnaryServerCallHandler$UnaryServerCallListener.onHalfClose(ServerCalls.java:180)\n\tat io.grpc.PartialForwardingServerCallListener.onHalfClose(PartialForwardingServerCallListener.java:35)\n\tat io.grpc.ForwardingServerCallListener.onHalfClose(ForwardingServerCallListener.java:23)\n\tat io.grpc.ForwardingServerCallListener$SimpleForwardingServerCallListener.onHalfClose(ForwardingServerCallListener.java:40)\n\tat io.grpc.Contexts$ContextualizedServerCallListener.onHalfClose(Contexts.java:86)\n\tat io.grpc.internal.ServerCallImpl$ServerStreamListenerImpl.halfClosed(ServerCallImpl.java:331)\n\tat io.grpc.internal.ServerImpl$JumpToApplicationThreadServerStreamListener$1HalfClosed.runInContext(ServerImpl.java:814)\n\tat io.grpc.internal.ContextRunnable.run(ContextRunnable.java:37)\n\tat io.grpc.internal.SerializingExecutor.run(SerializingExecutor.java:123)\n\tat java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)\n\tat java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)\n\tat java.lang.Thread.run(Thread.java:750)\n \n\nexception trans path: rollsite(9999)"\n\tdebug_error_string = "{"created":"@1681264583.445007587","description":"Error received from peer ipv4:10.99.68.60:9370","file":"src/core/lib/surface/call.cc","file_line":952,"grpc_message":"\\n[Roll Site Error TransInfo] \\n location msg=operation POST not supported \\n stack info=scala.NotImplementedError: operation POST not supported\\n\\tat com.webank.eggroll.rollsite.EggSiteServicer.processCommand(EggSiteServicer.scala:173)\\n\\tat com.webank.eggroll.rollsite.EggSiteServicer.unaryCall(EggSiteServicer.scala:95)\\n\\tat com.webank.ai.eggroll.api.networking.proxy.DataTransferServiceGrpc$MethodHandlers.invoke(DataTransferServiceGrpc.java:406)\\n\\tat io.grpc.stub.ServerCalls$UnaryServerCallHandler$UnaryServerCallListener.onHalfClose(ServerCalls.java:180)\\n\\tat io.grpc.PartialForwardingServerCallListener.onHalfClose(PartialForwardingServerCallListener.java:35)\\n\\tat io.grpc.ForwardingServerCallListener.onHalfClose(ForwardingServerCallListener.java:23)\\n\\tat io.grpc.ForwardingServerCallListener$SimpleForwardingServerCallListener.onHalfClose(ForwardingServerCallListener.java:40)\\n\\tat io.grpc.Contexts$ContextualizedServerCallListener.onHalfClose(Contexts.java:86)\\n\\tat io.grpc.internal.ServerCallImpl$ServerStreamListenerImpl.halfClosed(ServerCallImpl.java:331)\\n\\tat io.grpc.internal.ServerImpl$JumpToApplicationThreadServerStreamListener$1HalfClosed.runInContext(ServerImpl.java:814)\\n\\tat io.grpc.internal.ContextRunnable.run(ContextRunnable.java:37)\\n\\tat io.grpc.internal.SerializingExecutor.run(SerializingExecutor.java:123)\\n\\tat java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)\\n\\tat java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)\\n\\tat java.lang.Thread.run(Thread.java:750)\\n \\n\\nexception trans path: rollsite(9999)","grpc_status":2}"\n>'}}, 'arbiter': {10000: {'retcode': <RetCode.FEDERATED_ERROR: 104>, 'retmsg': 'Federated schedule error, <_InactiveRpcError of RPC that terminated with:\n\tstatus = StatusCode.UNKNOWN\n\tdetails = "\n[Roll Site Error TransInfo] \n location msg=operation POST not supported \n stack info=scala.NotImplementedError: operation POST not supported\n\tat com.webank.eggroll.rollsite.EggSiteServicer.processCommand(EggSiteServicer.scala:173)\n\tat com.webank.eggroll.rollsite.EggSiteServicer.unaryCall(EggSiteServicer.scala:95)\n\tat com.webank.ai.eggroll.api.networking.proxy.DataTransferServiceGrpc$MethodHandlers.invoke(DataTransferServiceGrpc.java:406)\n\tat io.grpc.stub.ServerCalls$UnaryServerCallHandler$UnaryServerCallListener.onHalfClose(ServerCalls.java:180)\n\tat io.grpc.PartialForwardingServerCallListener.onHalfClose(PartialForwardingServerCallListener.java:35)\n\tat io.grpc.ForwardingServerCallListener.onHalfClose(ForwardingServerCallListener.java:23)\n\tat io.grpc.ForwardingServerCallListener$SimpleForwardingServerCallListener.onHalfClose(ForwardingServerCallListener.java:40)\n\tat io.grpc.Contexts$ContextualizedServerCallListener.onHalfClose(Contexts.java:86)\n\tat io.grpc.internal.ServerCallImpl$ServerStreamListenerImpl.halfClosed(ServerCallImpl.java:331)\n\tat io.grpc.internal.ServerImpl$JumpToApplicationThreadServerStreamListener$1HalfClosed.runInContext(ServerImpl.java:814)\n\tat io.grpc.internal.ContextRunnable.run(ContextRunnable.java:37)\n\tat io.grpc.internal.SerializingExecutor.run(SerializingExecutor.java:123)\n\tat java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)\n\tat java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)\n\tat java.lang.Thread.run(Thread.java:750)\n \n\nexception trans path: rollsite(9999)"\n\tdebug_error_string = "{"created":"@1681264589.490125154","description":"Error received from peer ipv4:10.99.68.60:9370","file":"src/core/lib/surface/call.cc","file_line":952,"grpc_message":"\\n[Roll Site Error TransInfo] \\n location msg=operation POST not supported \\n stack info=scala.NotImplementedError: operation POST not supported\\n\\tat com.webank.eggroll.rollsite.EggSiteServicer.processCommand(EggSiteServicer.scala:173)\\n\\tat com.webank.eggroll.rollsite.EggSiteServicer.unaryCall(EggSiteServicer.scala:95)\\n\\tat com.webank.ai.eggroll.api.networking.proxy.DataTransferServiceGrpc$MethodHandlers.invoke(DataTransferServiceGrpc.java:406)\\n\\tat io.grpc.stub.ServerCalls$UnaryServerCallHandler$UnaryServerCallListener.onHalfClose(ServerCalls.java:180)\\n\\tat io.grpc.PartialForwardingServerCallListener.onHalfClose(PartialForwardingServerCallListener.java:35)\\n\\tat io.grpc.ForwardingServerCallListener.onHalfClose(ForwardingServerCallListener.java:23)\\n\\tat io.grpc.ForwardingServerCallListener$SimpleForwardingServerCallListener.onHalfClose(ForwardingServerCallListener.java:40)\\n\\tat io.grpc.Contexts$ContextualizedServerCallListener.onHalfClose(Contexts.java:86)\\n\\tat io.grpc.internal.ServerCallImpl$ServerStreamListenerImpl.halfClosed(ServerCallImpl.java:331)\\n\\tat io.grpc.internal.ServerImpl$JumpToApplicationThreadServerStreamListener$1HalfClosed.runInContext(ServerImpl.java:814)\\n\\tat io.grpc.internal.ContextRunnable.run(ContextRunnable.java:37)\\n\\tat io.grpc.internal.SerializingExecutor.run(SerializingExecutor.java:123)\\n\\tat java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)\\n\\tat java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)\\n\\tat java.lang.Thread.run(Thread.java:750)\\n \\n\\nexception trans path: rollsite(9999)","grpc_status":2}"\n>'}}})\n'} To Reproduce Steps to reproduce the behavior:

  1. Go to 'https://github.com/vsphere-tmm/KubeFATE-on-VCF/blob/main/credit_card_use_case/homo_lr_guest.ipynb'
  2. Run this cell 'Define the Components of Training Pipeline'

initialize pipeline

pipeline = PipeLine()

set job initiator

pipeline.set_initiator(role="guest", party_id=guest)

set participants information

pipeline.set_roles(guest=guest, host=host, arbiter=arbiter)

add components to pipeline, in order of task execution

pipeline.add_component(reader_0) pipeline.add_component(reader_1)

pipeline.add_component(dataio_0, data=Data(data=reader_0.output.data)) pipeline.add_component(dataio_1, data=Data(data=reader_1.output.data), model=Model(dataio_0.output.model))

homo_lr_0 = HomoLR(name='homo_lr_0', **param) pipeline.add_component(homo_lr_0, data=Data(train_data=dataio_0.output.data, validate_data=dataio_1.output.data))

evaluation_0 = Evaluation(name="evaluation_0", eval_type="binary") evaluation_0.get_party_instance(role='host', party_id=host).component_param(need_run=False) pipeline.add_component(evaluation_0, data=Data(data=homo_lr_0.output.data))

compile pipeline once finished adding modules, this step will form conf and dsl files for running job

pipeline.compile()

fit model

job_parameters = JobParameters(backend=backend, work_mode=work_mode) pipeline.fit(job_parameters)

query component summary

import json print(json.dumps(pipeline.get_component("homo_lr_0").get_summary(), indent=4))' 5. See error you will see the above error

Expected behavior How to submit the pipeline for training? how to rectify this error?

Desktop (please complete the following information):

  • OS: Windows
  • Browser chrome
  • usinge kubefate v1.9.0
  • ingress controller v1.1.3
  • kubernetes v1.23.5

Mansi2487 avatar Apr 12 '23 02:04 Mansi2487

跑Homo-NN-Quick-Start代码遇到了一模一样的问题,有懂的老哥吗

zzgls avatar Jun 07 '23 06:06 zzgls

Kubefate 1.8.0遇到一样问题,rollsite之间通讯报错operation POST not supported

martin2384798 avatar Jan 25 '24 12:01 martin2384798