flybody icon indicating copy to clipboard operation
flybody copied to clipboard

Ray distributed training error

Open FanboZhao opened this issue 3 months ago • 3 comments

When running train_dmpo_ray.py, the log shows that something can't be pickle properly. However ,it works when I use my PC, but fails when I use lab server.

Traceback (most recent call last):
  File "/home/fanbo/fly/flybody/train_dmpo_ray.py", line 190, in <module>
    learner = Learner.remote(replay_server.get_server_address.remote(),
  File "/home/fanbo/miniconda3/envs/flybody/lib/python3.10/site-packages/ray/actor.py", line 1297, in remote
    return self._remote(args=args, kwargs=kwargs, **self._default_options)
  File "/home/fanbo/miniconda3/envs/flybody/lib/python3.10/site-packages/ray/_private/auto_init_hook.py", line 22, in auto_init_wrapper
    return fn(*args, **kwargs)
  File "/home/fanbo/miniconda3/envs/flybody/lib/python3.10/site-packages/ray/util/tracing/tracing_helper.py", line 384, in _invocation_actor_class_remote_span
    return method(self, args, kwargs, *_args, **_kwargs)
  File "/home/fanbo/miniconda3/envs/flybody/lib/python3.10/site-packages/ray/actor.py", line 1731, in _remote
    actor_id = worker.core_worker.create_actor(
  File "python/ray/_raylet.pyx", line 3811, in ray._raylet.CoreWorker.create_actor
  File "python/ray/_raylet.pyx", line 3818, in ray._raylet.CoreWorker.create_actor
  File "python/ray/_raylet.pyx", line 907, in ray._raylet.prepare_args_and_increment_put_refs
  File "python/ray/_raylet.pyx", line 898, in ray._raylet.prepare_args_and_increment_put_refs
  File "python/ray/_raylet.pyx", line 948, in ray._raylet.prepare_args_internal
  File "/home/fanbo/miniconda3/envs/flybody/lib/python3.10/site-packages/ray/_private/serialization.py", line 672, in serialize
    return self._serialize_to_msgpack(value)
  File "/home/fanbo/miniconda3/envs/flybody/lib/python3.10/site-packages/ray/_private/serialization.py", line 605, in _serialize_to_msgpack
    pickle5_serialized_object = self._serialize_to_pickle5(
  File "/home/fanbo/miniconda3/envs/flybody/lib/python3.10/site-packages/ray/_private/serialization.py", line 552, in _serialize_to_pickle5
    raise e
  File "/home/fanbo/miniconda3/envs/flybody/lib/python3.10/site-packages/ray/_private/serialization.py", line 547, in _serialize_to_pickle5
    inband = pickle.dumps(
  File "/home/fanbo/miniconda3/envs/flybody/lib/python3.10/site-packages/ray/cloudpickle/cloudpickle.py", line 1479, in dumps
    cp.dump(obj)
  File "/home/fanbo/miniconda3/envs/flybody/lib/python3.10/site-packages/ray/cloudpickle/cloudpickle.py", line 1245, in dump
    return super().dump(obj)
_pickle.PicklingError: Can't pickle <functools._lru_cache_wrapper object at 0x7725f609dc70>: it's not the same object as `typing.Generic.__class_getitem__```

FanboZhao avatar Sep 23 '25 12:09 FanboZhao

I also encountered this issue. Have you resolved it? Could you share the solution?

jn12-29 avatar Oct 22 '25 05:10 jn12-29

FYI @jn12-29 @FanboZhao I just fixed this by downgrading Ray with pip install ray==2.44.0

alexispomares avatar Nov 06 '25 12:11 alexispomares