ray
ray copied to clipboard
[RLlib] TypeError converting batch (INFOS) to torch tensor with ConnectorV2
What happened + What you expected to happen
The method convert_to_torch_tensor
fails and returns the following TypeError:
TypeError: can't convert np.ndarray of type numpy.str_. The only supported types are: float64, float32, float16, complex64, complex128, int64, int32, int16, int8, uint8, and bool.
The new ConnectorV2 doesn't batch INFOS and it remains a list. This results in the previous TypeError whenever the info returned from reset
or 'step' isn't empty,
Versions / Dependencies
Ray: 2.10 gymnasium: 0.28.1 torch: 2.2.1 python: 3.10.11 OS: Windows 11
Reproduction script
Modify the step
method's return of RandomEnv
with a non-empty info dict:
examples/env/random_env.py:85
return (
self.observation_space.sample(),
self.reward_space.sample(),
terminated,
truncated,
{"NotEmpty": "TypeError"},
)
Reproduction script:
from gymnasium.spaces import Box, Discrete
from ray.tune.logger import pretty_print
from ray.rllib.algorithms import ppo
from ray.rllib.core.rl_module.rl_module import SingleAgentRLModuleSpec
from ray.rllib.env.single_agent_env_runner import SingleAgentEnvRunner
from ray.rllib.examples.env.action_mask_env import ActionMaskEnv
from ray.rllib.examples.rl_module.action_masking_rlm import TorchActionMaskRLM
rlm_spec = SingleAgentRLModuleSpec(module_class=TorchActionMaskRLM)
config = (
ppo.PPOConfig()
.environment(
ActionMaskEnv,
env_config={
"action_space": Discrete(100),
"observation_space": Box(-1.0, 1.0, (5,)),
},
)
.experimental(
_enable_new_api_stack=True,
_disable_preprocessor_api=True,
)
.framework("torch")
.training(model={"uses_new_env_runners": True})
.rollouts(
num_rollout_workers = 0,
env_runner_cls= SingleAgentEnvRunner,
)
.rl_module(rl_module_spec=rlm_spec)
.resources(
num_learner_workers=0,
num_gpus_per_learner_worker=0,
num_cpus_for_local_worker=1,
)
)
algo = config.build()
for _ in range(5):
result = algo.train()
print(pretty_print(result))
Issue Severity
Medium: It is a significant difficulty but I can work around it.
@ciroaceto Thanks for filing this issue. The Ray version you are using is quite old. Could you try if you can replicate this error on the actual version ray-20.2.?
@simonsays1980 it seems to be working fine with ray-2.20. I modified the ActionMaskEnv (non-empty dict returned from reset() and step()) and executed the action_mask.py example. No error showed up.