Erik Jenner
Results
2
comments of
Erik Jenner
Thanks for spotting and describing this, not sure why I set `deterministic_policy=True` in the exploration wrapper. So it should be fine to make that `False`, at least I agree it...
> I've seen policies and reward networks sometimes have the number of envs get baked into their expected observation/action shape in the past. Although I thought that was no longer...