ReinforcementLearning.jl
ReinforcementLearning.jl copied to clipboard
Improve the logging mechanism during training
Currently, in each policy or learner, we allocate a temp memory to record intermediate data. And to record these data, we need to add an extra hook. There're at least 3 problems:
- Each policy or learner needs to define some extra fields simply for caching these intermediate data.
- In some policies, the intermediate data will be updated several times (think about the PPO). This means hooks can only see the last statistics in each update step.
- We can't do early filtering.
To improve it, the idea is simple, we can just leverage the logging system in Julia and some utils in LoggingExtras.jl.
Basically, we can replace all the existing logging lines with @debug "index/name" x=y ... _group=DEFAULT_GROUP. And provide a filter (see the concepts in LoggingExtras.jl) to extract all the logs with _group of DEFAULT_GROUP. Then we can use any log sinks to write the logs.
Note that:
- Here we prefer
@debugto avoid printing those statistics with the default logger _groupis required so that we can easily distinguish them from other logs- A default filter must be provided (maybe simply reuse
EarlyFilteredLogger?) Wandb.jlandTensorBoardLogger.jlshould be supported out of the box as sinks. (TODO: add some examples in docs.)