imitation icon indicating copy to clipboard operation
imitation copied to clipboard

Consider allowing integer reward in trajectories

Open PavelCz opened this issue 1 year ago • 0 comments

Problem

Due to this validation environments returning integer rewards will throw an exception, e.g. when I try to collect rollouts from an expert policy.

This seems a bit overzealous. Here are some things to consider:

  • gym did somewhat specify float (see), but it doesn't seem like this was enforced anywhere.
    • e.g. gym-minigrid can return 0 (as an integer) in some situations
  • gymnasium now specifies typing.SupportsFloat (see), which allows integers (and envs such as MiniGrid still use ints).
  • sb3 vec_envs are documented as returning float (see)
    • However, this is not enforced in either the VecEnvWrapper or make_vec_env, AFAICT. That means I can wrap a valid gymnasium env in a VecEnv and my venv will return integers.
  • The imitation doc does say the dtype of a reward in TrajectoryWithRew is float (see).

Solution

I suggest changing the validation to np.number or typing.SupportsFloat (exists since python3.5) or np.floating or np.integer and updating the doc appropriately. Granted, this is all a bit awkward, since there is no exact equivalent of typing.SupportsFloat in numpy. If this is a desired change, I could open a PR on this.

Possible alternative solutions

If you want to support gym envs that return integer rewards but keep using floats internally, maybe we could convert integers to float at some point in the pipeline. This could for example be done in the RolloutInfoWrapper.

Currently, my workaround is a wrapper that converts all rewards to floats. Obviously not a big deal, but seems kind of unnecessary.

PavelCz avatar Sep 12 '23 01:09 PavelCz