tianshou
tianshou copied to clipboard
Implementation design issues in SubprocVectorEnv
Currently, in SubprocVectorEnv
, there is a single send method to do both reset
and step
. So, it relies on the action being None to call reset
instead of step
. Thus is a bad design choice because there is no reason to forbid the actual action to be None
. In particular, in Jiminy simulator, it means re-using the previous action for the current step. Not to mention that send
is not clear at all, which is already a good reason to rename it.
Secondly, the list of methods that can be called by subprocesses is hard-coded here, which excessively restricted for no actual reason. In my case, my environments have many other methods apart from the standard one, and it should be possible to extend SubprocVectorEnv
to add extra methods, which is not the case because of this. I suggest modifying the else clause:
else:
if not hasattr(env, cmd):
raise NotImplementedError
p.send(getattr(env, cmd)(data))
It should be possible to be generic enough to get rid of "reset", "render" and "seed", and "getattr" elif clauses.
PS: I'm planning my comeback in the next few months, hopefully for long :smiley:
Hello my old friend! Nice to hear from that great news!
I change this to support EnvPool async mode little by little. in EnvPool, both the step
and reset
are split into two folds:
- step = send(action) +recv()
- reset = send() + recv()
and actually there's no need to treat reset and step differently. If we apply an auto-reset environment wrapper, and treat "step(action)
while done=True" as reset, all operations can be vectorized and achieve the best performance. This approach also requires to refactor the collector's code, but after this refactor, collector will be much cleaner.
I see what you mean, and I mostly agree, but in my view what you describe should be a higher level API than what SubprocVectorEnv
is supposed to be. SubprocVectorEnv
is the lowest-level of control you can possibly have and as such all restriction must be very well motivated, which is not the case here. SubprocVectorEnv
is not limited to learning. It could be used for testing, video recording...etc In this context, it makes sense to be able to reset even if done
is false, and to continue step even though done
has been true once in the past. It is not part of OpenAI Gym specification to have to reset the env once done is receive, and it is actually a case that is handle in the official repo as far as I remember for the toy envs. It just throws a warning to make sure the user is aware of it but it can still continue stepping. It can be very useful to debug and analyze complex env.
It should be possible to be generic enough to get rid of "reset", "render" and "seed", and "getattr" elif clauses.
You are probably looking for env_method
as done in SB3 ;) https://github.com/DLR-RM/stable-baselines3/blob/master/stable_baselines3/common/vec_env/subproc_vec_env.py#L48
(together with set_attr
and get_attr
)
What's the status here, has the issue been fully addressed by now?