tianshou icon indicating copy to clipboard operation
tianshou copied to clipboard

Log info, not just reward

Open vadim0x60 opened this issue 2 years ago • 5 comments

  • [ ] I have marked all applicable categories:
    • [ ] exception-raising bug
    • [ ] RL algorithm bug
    • [ ] documentation request (i.e. "X is missing from the documentation.")
    • [x] new feature request
  • [x] I have visited the source website
  • [x] I have searched through the issue tracker for duplicates
  • [x] I have mentioned version numbers, operating system and environment, where applicable:

An openai gym environment returns the observation, reward, info, done tuple at every step, of which tianshou (as of 0.4.8) loggers (I am using WandbLogger, but the same issue may exist with all loggers) only log the reward. Logging done doesn't make a lot of sense and logging observation is hard since they can be very high dimentional - instead of logging them, we often use video recorders. But info is supposed to just be a dictionary with additional variables that can be very useful for RL debugging. I believe these variables should be logged to Tensorboard, Weights and Biases and other logging solutions.

vadim0x60 avatar May 31 '22 11:05 vadim0x60

I think the problem for us is: info dict is not gatherable. See https://github.com/thu-ml/tianshou/issues/455#issuecomment-933628187 Please let me know if you have a better solution.

Trinkle23897 avatar Jun 01 '22 03:06 Trinkle23897

It might be gatherable in some specific case... Maybe instead of assuming that rewards are gatherable and infos aren't we can define some function gather_fn(trajectories : Sequence[[Tuple[ActType, ObsType, float, bool, dict]]) -> Dict[str,float]?

By default gather_fn() would be what it is now (gathering rewards only), but the user can redefine it. I guess the trick with preprocess_fn that you've described in #455 lets one do this in a roundabout way, but of course it would be a lot more convenient to support this directly

vadim0x60 avatar Jun 07 '22 13:06 vadim0x60

I guess the trick with preprocess_fn that you've described in https://github.com/thu-ml/tianshou/issues/455 lets one do this in a roundabout way, but of course it would be a lot more convenient to support this directly

Agree!

Trinkle23897 avatar Jun 11 '22 20:06 Trinkle23897

Besides, the last info returned by the environment is the conventional place to store various episode statistics, so maybe just always log the last info?

vadim0x60 avatar Jun 29 '22 16:06 vadim0x60

That's reasonable, but actually if you call collector.collect(n_steps=x), it may collect multiple episodes instead of only one episode. So this issue cannot bypass by this method.

Trinkle23897 avatar Jun 29 '22 17:06 Trinkle23897

This should be addressed with custom callbacks which will be implemented for resolving #895. @vadim0x60 do you agree? If yes, then we can close this issue

MischaPanch avatar Sep 07 '23 12:09 MischaPanch

Sounds good!

vadim0x60 avatar Sep 12 '23 10:09 vadim0x60