Erik Jenner

Results 2 comments of Erik Jenner

Thanks for spotting and describing this, not sure why I set `deterministic_policy=True` in the exploration wrapper. So it should be fine to make that `False`, at least I agree it...

> I've seen policies and reward networks sometimes have the number of envs get baked into their expected observation/action shape in the past. Although I thought that was no longer...