ray icon indicating copy to clipboard operation
ray copied to clipboard

[RLlib] TypeError converting batch (INFOS) to torch tensor with ConnectorV2

Open ciroaceto opened this issue 3 months ago • 2 comments

What happened + What you expected to happen

The method convert_to_torch_tensor fails and returns the following TypeError:

TypeError: can't convert np.ndarray of type numpy.str_. The only supported types are: float64, float32, float16, complex64, complex128, int64, int32, int16, int8, uint8, and bool.

The new ConnectorV2 doesn't batch INFOS and it remains a list. This results in the previous TypeError whenever the info returned from reset or 'step' isn't empty,

Versions / Dependencies

Ray: 2.10 gymnasium: 0.28.1 torch: 2.2.1 python: 3.10.11 OS: Windows 11

Reproduction script

Modify the step method's return of RandomEnv with a non-empty info dict:

examples/env/random_env.py:85

return (
        self.observation_space.sample(),
        self.reward_space.sample(),
        terminated,
        truncated,
        {"NotEmpty": "TypeError"},
            )

Reproduction script:

from gymnasium.spaces import Box, Discrete
from ray.tune.logger import pretty_print
from ray.rllib.algorithms import ppo
from ray.rllib.core.rl_module.rl_module import SingleAgentRLModuleSpec
from ray.rllib.env.single_agent_env_runner import SingleAgentEnvRunner
from ray.rllib.examples.env.action_mask_env import ActionMaskEnv
from ray.rllib.examples.rl_module.action_masking_rlm import TorchActionMaskRLM


rlm_spec = SingleAgentRLModuleSpec(module_class=TorchActionMaskRLM)

config = (
    ppo.PPOConfig()
    .environment(
        ActionMaskEnv,
        env_config={
                "action_space": Discrete(100),
                "observation_space": Box(-1.0, 1.0, (5,)),
            },
    )
    .experimental(
        _enable_new_api_stack=True,
        _disable_preprocessor_api=True,
    )
    .framework("torch")
    .training(model={"uses_new_env_runners": True})
    .rollouts(
        num_rollout_workers = 0,
        env_runner_cls= SingleAgentEnvRunner,
    )
    .rl_module(rl_module_spec=rlm_spec)
    .resources(
        num_learner_workers=0,
        num_gpus_per_learner_worker=0,
        num_cpus_for_local_worker=1,
    )
)

algo = config.build()

for _ in range(5):
    result = algo.train()
    print(pretty_print(result))

Issue Severity

Medium: It is a significant difficulty but I can work around it.

ciroaceto avatar Apr 04 '24 15:04 ciroaceto

@ciroaceto Thanks for filing this issue. The Ray version you are using is quite old. Could you try if you can replicate this error on the actual version ray-20.2.?

simonsays1980 avatar May 15 '24 12:05 simonsays1980

@simonsays1980 it seems to be working fine with ray-2.20. I modified the ActionMaskEnv (non-empty dict returned from reset() and step()) and executed the action_mask.py example. No error showed up.

ciroaceto avatar May 16 '24 09:05 ciroaceto