tianshou
tianshou copied to clipboard
Log info, not just reward
- [ ] I have marked all applicable categories:
- [ ] exception-raising bug
- [ ] RL algorithm bug
- [ ] documentation request (i.e. "X is missing from the documentation.")
- [x] new feature request
- [x] I have visited the source website
- [x] I have searched through the issue tracker for duplicates
- [x] I have mentioned version numbers, operating system and environment, where applicable:
An openai gym environment returns the observation, reward, info, done
tuple at every step, of which tianshou (as of 0.4.8) loggers (I am using WandbLogger
, but the same issue may exist with all loggers) only log the reward
. Logging done
doesn't make a lot of sense and logging observation
is hard since they can be very high dimentional - instead of logging them, we often use video recorders. But info
is supposed to just be a dictionary with additional variables that can be very useful for RL debugging. I believe these variables should be logged to Tensorboard, Weights and Biases and other logging solutions.
I think the problem for us is: info dict is not gatherable. See https://github.com/thu-ml/tianshou/issues/455#issuecomment-933628187 Please let me know if you have a better solution.
It might be gatherable in some specific case... Maybe instead of assuming that rewards are gatherable and infos aren't we can define some function gather_fn(trajectories : Sequence[[Tuple[ActType, ObsType, float, bool, dict]]) -> Dict[str,float]
?
By default gather_fn()
would be what it is now (gathering rewards only), but the user can redefine it. I guess the trick with preprocess_fn
that you've described in #455 lets one do this in a roundabout way, but of course it would be a lot more convenient to support this directly
I guess the trick with preprocess_fn that you've described in https://github.com/thu-ml/tianshou/issues/455 lets one do this in a roundabout way, but of course it would be a lot more convenient to support this directly
Agree!
Besides, the last info
returned by the environment is the conventional place to store various episode statistics, so maybe just always log the last info?
That's reasonable, but actually if you call collector.collect(n_steps=x)
, it may collect multiple episodes instead of only one episode. So this issue cannot bypass by this method.
This should be addressed with custom callbacks which will be implemented for resolving #895. @vadim0x60 do you agree? If yes, then we can close this issue
Sounds good!