KubeFATE
KubeFATE copied to clipboard
kubefate on kubernetes (1master, 1 worker node) python error No such file or directory: '1'
###**What deployment mode you are use? **
- Kuberentes.
**What KubeFATE and FATE version you are using? **
v1.9.0
**What OS you are using for docker-compse or Kubernetes? Please also clear the version of OS. **
OS: Ubuntu Version 18.04
OS: Windows 10 Browser Firefox Version 106.03
To Reproduce
I am getting the below error from python-0 pod
[ERROR][2022-11-03 17:22:08,549][command_client_2,pid:17,tid:140321505650432][client.py:96.sync_send] - Error calling to nodemanager-1.nodemanager:37019, command_uri: CommandURI(_uri=v1/egg-pair/runTask), req:ErCommandRequest(id=20221103.172208.543608, uri=v1/egg-pair/runTask, args=[[b'\np202211031721585025250_reader_0_0_guest_9999-py-job-20221103.172208.541952_cleanup-task-nodemanager-1.nodemanager\x12\x07destroy\x1aV\x08\xff\xff\xff\xff\xff\xff\xff\xff\xff\x01\x12>\x08\xff\xff\xff\xff\xff\xff\xff\xff\xff\x01\x12\x01*\x1a+202211031721585025250_reader_0_0_guest_9999"\x01*(\xff\xff\xff\xff\xff\xff\xff\xff\xff\x01*\x9e\x01\nQ202211031721585025250_reader_0_0_guest_9999-py-job-20221103.172208.541952_cleanup\x12\x07d'], len=1], kwargs=[***, len=0]) Traceback (most recent call last): File "/data/projects/fate/eggroll/python/eggroll/core/client.py", line 84, in sync_send response = _command_stub.call(request.to_proto()) File "/opt/python3/lib/python3.8/site-packages/grpc/_channel.py", line 946, in call return _end_unary_response_blocking(state, call, False, None) File "/opt/python3/lib/python3.8/site-packages/grpc/_channel.py", line 849, in _end_unary_response_blocking raise _InactiveRpcError(state) grpc._channel._InactiveRpcError: <_InactiveRpcError of RPC that terminated with: status = StatusCode.UNKNOWN details = "Exception calling application:
==== detail start, at 20221103.172208.548 ==== Traceback (most recent call last): File "/data/projects/fate/eggroll/python/eggroll/core/utils.py", line 187, in wrapper return func(*args, **kw) File "/data/projects/fate/eggroll/python/eggroll/roll_pair/egg_pair.py", line 245, in run_task shutil.rmtree(path) File "/opt/python3/lib/python3.8/shutil.py", line 718, in rmtree _rmtree_safe_fd(fd, path, onerror) File "/opt/python3/lib/python3.8/shutil.py", line 655, in _rmtree_safe_fd _rmtree_safe_fd(dirfd, fullname, onerror) File "/opt/python3/lib/python3.8/shutil.py", line 645, in _rmtree_safe_fd onerror(os.lstat, fullname, sys.exc_info()) File "/opt/python3/lib/python3.8/shutil.py", line 642, in _rmtree_safe_fd orig_st = entry.stat(follow_symlinks=False) FileNotFoundError: [Errno 2] No such file or directory: '1'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/data/projects/fate/eggroll/python/eggroll/core/utils.py", line 187, in wrapper
return func(*args, **kw)
File "/data/projects/fate/eggroll/python/eggroll/core/command/command_service.py", line 30, in call
call_result = CommandRouter.get_instance()
File "/data/projects/fate/eggroll/python/eggroll/core/command/command_router.py", line 94, in dispatch
raise e
File "/data/projects/fate/eggroll/python/eggroll/core/command/command_router.py", line 91, in dispatch
call_result = _method(_instance, *deserialized_args)
File "/data/projects/fate/eggroll/python/eggroll/core/utils.py", line 194, in wrapper
raise RuntimeError(msg)
RuntimeError:
==== detail start, at 20221103.172208.546 ==== Traceback (most recent call last): File "/data/projects/fate/eggroll/python/eggroll/core/utils.py", line 187, in wrapper return func(*args, **kw) File "/data/projects/fate/eggroll/python/eggroll/roll_pair/egg_pair.py", line 245, in run_task shutil.rmtree(path) File "/opt/python3/lib/python3.8/shutil.py", line 718, in rmtree _rmtree_safe_fd(fd, path, onerror) File "/opt/python3/lib/python3.8/shutil.py", line 655, in _rmtree_safe_fd _rmtree_safe_fd(dirfd, fullname, onerror) File "/opt/python3/lib/python3.8/shutil.py", line 645, in _rmtree_safe_fd onerror(os.lstat, fullname, sys.exc_info()) File "/opt/python3/lib/python3.8/shutil.py", line 642, in _rmtree_safe_fd orig_st = entry.stat(follow_symlinks=False) FileNotFoundError: [Errno 2] No such file or directory: '1'
==== detail end ====
==== detail end ====
"
debug_error_string = "{"created":"@1667496128.548778243","description":"Error received from peer ipv4:10.42.182.16:37019",
"file":"src/core/lib/surface/call.cc","file_line":952,"grpc_message":"Exception calling application: \n\n==== detail start, at
20221103.172208.548 ====\nTraceback (most recent call last):\n File "/data/projects/fate/eggroll/python/eggroll/core/utils.py",
line 187, in wrapper\n return func(*args, **kw)\n File "/data/projects/fate/eggroll/python/eggroll/roll_pair/egg_pair.py",
line 245, in run_task\n shutil.rmtree(path)\n File "/opt/python3/lib/python3.8/shutil.py", line 718, in rmtree\n
_rmtree_safe_fd(fd, path, onerror)\n File "/opt/python3/lib/python3.8/shutil.py", line 655, in _rmtree_safe_fd\n _rmtree_safe_fd(dirfd, fullname, onerror)\n
File "/opt/python3/lib/python3.8/shutil.py", line 645, in _rmtree_safe_fd\n
onerror(os.lstat, fullname, sys.exc_info())\n File "/opt/python3/lib/python3.8/shutil.py",
line 642, in _rmtree_safe_fd\n orig_st = entry.stat(follow_symlinks=False)\nFileNotFoundError:
[Errno 2] No such file or directory: '1'\n\nDuring handling of the above exception, another exception occurred:\n\nTraceback (most recent call last):\n
File "/data/projects/fate/eggroll/python/eggroll/core/utils.py", line 187, in wrapper\n
return func(*args, **kw)\n File "/data/projects/fate/eggroll/python/eggroll/core/command/command_service.py", line 30, in call\n
call_result = CommandRouter.get_instance() \n File "/data/projects/fate/eggroll/python/eggroll/core/command/command_router.py", line 94,
in dispatch\n raise e\n File "/data/projects/fate/eggroll/python/eggroll/core/command/command_router.py", line 91, in dispatch\n
call_result = _method(_instance, *deserialized_args)\n File "/data/projects/fate/eggroll/python/eggroll/core/utils.py", line 194,
in wrapper\n raise RuntimeError(msg)\nRuntimeError: \n\n==== detail start, at 20221103.172208.546 ====\nTraceback (most recent call last):\n
File "/data/projects/fate/eggroll/python/eggroll/core/utils.py", line 187, in wrapper\n return func(*args, **kw)\n
File "/data/projects/fate/eggroll/python/eggroll/roll_pair/egg_pair.py", line 245, in run_task\n shutil.rmtree(path)\n File "/opt/python3/lib/python3.8/shutil.py",
line 718, in rmtree\n _rmtree_safe_fd(fd, path, onerror)\n File "/opt/python3/lib/python3.8/shutil.py", line 655, in _rmtree_safe_fd\n
_rmtree_safe_fd(dirfd, fullname, onerror)\n File "/opt/python3/lib/python3.8/shutil.py", line 645, in _rmtree_safe_fd\n onerror(os.lstat, fullname, sys.exc_info())\n
File "/opt/python3/lib/python3.8/shutil.py", line 642, in _rmtree_safe_fd\n orig_st = entry.stat(follow_symlinks=False)\nFileNotFoundError: [Errno 2]
No such file or directory: '1'\n\n==== detail end ====\n\n\n\n==== detail end ====\n\n","grpc_status":2}"
I Installed kubefate with the below yam files for parties 9999 and 10000.
Party_10000
name: fate-10000
namespace: fate-10000
chartName: fate
chartVersion: v1.9.0
partyId: 10000
registry: ""
pullPolicy:
imagePullSecrets:
- name: myregistrykey
persistence: false
istio:
enabled: false
podSecurityPolicy:
enabled: false
ingressClassName: nginx
modules:
- rollsite
- clustermanager
- nodemanager
- mysql
- python
- fateboard
- client
computing: Eggroll
federation: Eggroll
storage: Eggroll
algorithm: Basic
device: IPCL
ingress:
fateboard:
hosts:
- name: party10000.fateboard.example.com
client:
hosts:
- name: party10000.notebook.example.com
rollsite:
type: NodePort
nodePort: 30101
exchange:
ip: 192.168.122.20
port: 30000
partyList:
- partyId: 1
partyIp: 192.168.122.20
partyPort: 30000
- partyId: 9999
partyIp: 192.168.122.20
partyPort: 30091
python:
type: NodePort
httpNodePort: 30107
grpcNodePort: 30102
logLevel: INFO
servingIp: 192.168.122.20
servingPort: 30105
Party_9999
name: fate-9999
namespace: fate-9999
chartName: fate
chartVersion: v1.9.0
partyId: 9999
registry: ""
pullPolicy:
imagePullSecrets:
- name: myregistrykey
persistence: false
istio:
enabled: false
podSecurityPolicy:
enabled: false
ingressClassName: nginx
modules:
- rollsite
- clustermanager
- nodemanager
- mysql
- python
- fateboard
- client
computing: Eggroll
federation: Eggroll
storage: Eggroll
algorithm: Basic
device: IPCL
ingress:
fateboard:
hosts:
- name: party9999.fateboard.example.com
client:
hosts:
- name: party9999.notebook.example.com
rollsite:
type: NodePort
nodePort: 30091
exchange:
ip: 192.168.122.20
port: 30000
partyList:
- partyId: 1
partyIp: 192.168.122.20
partyPort: 30000
- partyId: 10000
partyIp: 192.168.122.20
partyPort: 30101
python:
type: NodePort
httpNodePort: 30097
grpcNodePort: 30092
logLevel: INFO
servingIp: 192.168.122.20
servingPort: 30095
**Exchange**
name: fate-exchange
namespace: fate-exchange
chartName: fate-exchange
chartVersion: v1.9.0
partyId: 1
registry: ""
pullPolicy:
imagePullSecrets:
- name: myregistrykey
persistence: false
istio:
enabled: false
podSecurityPolicy:
enabled: false
modules:
- rollsite
rollsite:
type: NodePort
nodePort: 30000
enableTLS: false
partyList:
- partyId: 10000
partyIp: 192.168.122.20
partyPort: 30101
- partyId: 9999
partyIp: 192.168.122.20
partyPort: 30091
Could you please elaborate? what could be the issue?
thanks
Remove the rollsite.partyList
part of 9999 and 10000.