Pavel C
Pavel C
## Problem Not sure I would call this a bug, but it is definitely unintuitive behavior for me. The fundamental issue is that all CNN Policies in SB3 are just...
## Problem Due to [this validation](https://github.com/HumanCompatibleAI/imitation/blob/5c85ebf02a591dad171946710d80617cfcca108e/src/imitation/data/types.py#L131) environments returning integer rewards will throw an exception, e.g. when I try to collect rollouts from an expert policy. This seems a bit overzealous....
## Bug description I'm trying to create a named config for a CnnRewardNet. AFAICT this is currently not possible for scripts that use the `reward` ingredient (such as `train_preference_comparisons.py`, due...