tianshou Log info, not just reward

[ ] I have marked all applicable categories:
- [ ] exception-raising bug
- [ ] RL algorithm bug
- [ ] documentation request (i.e. "X is missing from the documentation.")
- [x] new feature request
[x] I have visited the source website
[x] I have searched through the issue tracker for duplicates
[x] I have mentioned version numbers, operating system and environment, where applicable:

An openai gym environment returns the observation, reward, info, done tuple at every step, of which tianshou (as of 0.4.8) loggers (I am using WandbLogger, but the same issue may exist with all loggers) only log the reward. Logging done doesn't make a lot of sense and logging observation is hard since they can be very high dimentional - instead of logging them, we often use video recorders. But info is supposed to just be a dictionary with additional variables that can be very useful for RL debugging. I believe these variables should be logged to Tensorboard, Weights and Biases and other logging solutions.

May 31 '22 11:05 vadim0x60

I think the problem for us is: info dict is not gatherable. See https://github.com/thu-ml/tianshou/issues/455#issuecomment-933628187 Please let me know if you have a better solution.

Jun 01 '22 03:06 Trinkle23897

It might be gatherable in some specific case... Maybe instead of assuming that rewards are gatherable and infos aren't we can define some function gather_fn(trajectories : Sequence[[Tuple[ActType, ObsType, float, bool, dict]]) -> Dict[str,float]?

By default gather_fn() would be what it is now (gathering rewards only), but the user can redefine it. I guess the trick with preprocess_fn that you've described in #455 lets one do this in a roundabout way, but of course it would be a lot more convenient to support this directly

Jun 07 '22 13:06 vadim0x60

I guess the trick with preprocess_fn that you've described in https://github.com/thu-ml/tianshou/issues/455 lets one do this in a roundabout way, but of course it would be a lot more convenient to support this directly

Agree!

Jun 11 '22 20:06 Trinkle23897

Besides, the last info returned by the environment is the conventional place to store various episode statistics, so maybe just always log the last info?

Jun 29 '22 16:06 vadim0x60

That's reasonable, but actually if you call collector.collect(n_steps=x), it may collect multiple episodes instead of only one episode. So this issue cannot bypass by this method.

Jun 29 '22 17:06 Trinkle23897

This should be addressed with custom callbacks which will be implemented for resolving #895. @vadim0x60 do you agree? If yes, then we can close this issue

Sep 07 '23 12:09 MischaPanch

Sounds good!

Sep 12 '23 10:09 vadim0x60

tianshou tianshou copied to clipboard

Log info, not just reward

tianshou
tianshou copied to clipboard