flybody
flybody copied to clipboard
Ray distributed training error
When running train_dmpo_ray.py, the log shows that something can't be pickle properly. However ,it works when I use my PC, but fails when I use lab server.
Traceback (most recent call last):
File "/home/fanbo/fly/flybody/train_dmpo_ray.py", line 190, in <module>
learner = Learner.remote(replay_server.get_server_address.remote(),
File "/home/fanbo/miniconda3/envs/flybody/lib/python3.10/site-packages/ray/actor.py", line 1297, in remote
return self._remote(args=args, kwargs=kwargs, **self._default_options)
File "/home/fanbo/miniconda3/envs/flybody/lib/python3.10/site-packages/ray/_private/auto_init_hook.py", line 22, in auto_init_wrapper
return fn(*args, **kwargs)
File "/home/fanbo/miniconda3/envs/flybody/lib/python3.10/site-packages/ray/util/tracing/tracing_helper.py", line 384, in _invocation_actor_class_remote_span
return method(self, args, kwargs, *_args, **_kwargs)
File "/home/fanbo/miniconda3/envs/flybody/lib/python3.10/site-packages/ray/actor.py", line 1731, in _remote
actor_id = worker.core_worker.create_actor(
File "python/ray/_raylet.pyx", line 3811, in ray._raylet.CoreWorker.create_actor
File "python/ray/_raylet.pyx", line 3818, in ray._raylet.CoreWorker.create_actor
File "python/ray/_raylet.pyx", line 907, in ray._raylet.prepare_args_and_increment_put_refs
File "python/ray/_raylet.pyx", line 898, in ray._raylet.prepare_args_and_increment_put_refs
File "python/ray/_raylet.pyx", line 948, in ray._raylet.prepare_args_internal
File "/home/fanbo/miniconda3/envs/flybody/lib/python3.10/site-packages/ray/_private/serialization.py", line 672, in serialize
return self._serialize_to_msgpack(value)
File "/home/fanbo/miniconda3/envs/flybody/lib/python3.10/site-packages/ray/_private/serialization.py", line 605, in _serialize_to_msgpack
pickle5_serialized_object = self._serialize_to_pickle5(
File "/home/fanbo/miniconda3/envs/flybody/lib/python3.10/site-packages/ray/_private/serialization.py", line 552, in _serialize_to_pickle5
raise e
File "/home/fanbo/miniconda3/envs/flybody/lib/python3.10/site-packages/ray/_private/serialization.py", line 547, in _serialize_to_pickle5
inband = pickle.dumps(
File "/home/fanbo/miniconda3/envs/flybody/lib/python3.10/site-packages/ray/cloudpickle/cloudpickle.py", line 1479, in dumps
cp.dump(obj)
File "/home/fanbo/miniconda3/envs/flybody/lib/python3.10/site-packages/ray/cloudpickle/cloudpickle.py", line 1245, in dump
return super().dump(obj)
_pickle.PicklingError: Can't pickle <functools._lru_cache_wrapper object at 0x7725f609dc70>: it's not the same object as `typing.Generic.__class_getitem__```
I also encountered this issue. Have you resolved it? Could you share the solution?
FYI @jn12-29 @FanboZhao I just fixed this by downgrading Ray with pip install ray==2.44.0