stable-baselines3
stable-baselines3 copied to clipboard
Also log hyperparameters to the tensorboard
🚀 Feature
It would be nice if the used hyperparameters would be included in the tensorboard log for better comparability.
Motivation
Manually comparing the hyperparameters in a separate location isn't as clean and tensorboard supports it. This would use the full potential.
Additional context
https://pytorch.org/docs/stable/tensorboard.html#torch.utils.tensorboard.writer.SummaryWriter.add_hparams
### Checklist
- [x] I have checked that there is no similar issue in the repo (required)
Definitely sounds like an useful feature. The largest part here would be to figure out where these hyperparameters should be logged and what should be fed in. This sounds like it should include stuff from algorithm level (e.g. PPO, SAC) and policy-class level (OnPolicy and OffPolicyAlgorithm). Sounds like we need a get_hyperparameters
function for each algorithm that returns a dictionary of hyperparameters and their values.
@araffin comments?
Hello, you forgot to fill-in the "alternatives" section ;)
As an alternative, you can easily define a callback since https://github.com/DLR-RM/stable-baselines3/issues/286 was merged. It is also in the documentation: https://stable-baselines3.readthedocs.io/en/master/guide/tensorboard.html#directly-accessing-the-summary-writer.
Sounds like we need a get_hyperparameters function for each algorithm that returns a dictionary of hyperparameters and their values.
Well, this is the part I'm not a big fan of...
It is also in the documentation: https://stable-baselines3.readthedocs.io/en/master/guide/tensorboard.html#directly-accessing-the-summary-writer.
Ah this sounds like a good approach to this, so users can do their own logging.
Well, this is the part I'm not a big fan of...
Yeeeah this would get very long and error-prone (something is updated/added but then not included in this list, etc), so I would also rather avoid it if possible ^^
If I understand you correctly you suggest that I register a callback myself with the hyperparameters I need?
For the context. I use the tools provided by the zoo repo to run e.g. PPO in my env from the command line. They run fine out of the box and it would be nice if e.g. the PPO hyperparameters as defined in my yaml are logged without further interaction with the code. Other values like loss, reward, ... are also logged without further interaction. I think logging crucial hyperparameters like epochs
, learning_rate
, batch_size
are on a similar level as logging the other standard metrics during the training. Adding fancy stuff by hand using this callback is cool, but I think this is more basic. But maybe this is more of an issue for the zoo repo and the callback should be implemented there?
Sry if there are misconceptions on my side, I am not too familiar with the exact internal structure of this repo. ^^
Hmm you raise a good point! I have not used the newest version of zoo, but at least in the past it did not log all the parameters anywhere (you only have the .yml file that updates parameters). Sounds like this could be an enhancement for zoo @araffin ?
If I understand you correctly you suggest that I register a callback myself with the hyperparameters I need?
Yes, adding callbacks is in fact included in the rl zoo (I just updated the doc today): https://github.com/DLR-RM/rl-baselines3-zoo#callbacks
They run fine out of the box and it would be nice if e.g. the PPO hyperparameters as defined in my yaml are logged without further interaction with the code.
the one defined in the yaml file are saved but not sent to tensorboard.
Adding fancy stuff by hand using this callback is cool, but I think this is more basic.
Adding a callback is quite basic (I think it should be only ~10 lines of code from what you are describing).
As a follow-up, you can take a look at what is done in the wandb callback: https://github.com/wandb/client/blob/master/wandb/integration/sb3/sb3.py (where all hyperparameters are saved)
Documentation: https://gitbook-docs.wandb.ai/guides/integrations/other/stable-baselines-3
Are there any plans of doing some integration with MLFlow too?
@EloyAnguiano have a look:
import sys
from typing import Dict, Any, Union, Tuple
import gym
import mlflow
import numpy as np
from stable_baselines3 import SAC
from stable_baselines3.common.logger import KVWriter, Logger, HumanOutputFormat
class MLflowOutputFormat(KVWriter):
def write(self, key_values: Dict[str, Any], key_excluded: Dict[str, Union[str, Tuple[str, ...]]], step: int = 0) -> None:
for (key, value), (_, excluded) in zip(sorted(key_values.items()), sorted(key_excluded.items())):
if excluded is not None and "mlflow" in excluded:
continue
if isinstance(value, np.ScalarType):
if not isinstance(value, str):
mlflow.log_metric(key, value, step)
env = gym.make("Pendulum-v1")
loggers = Logger(folder=None, output_formats=[HumanOutputFormat(sys.stdout), MLflowOutputFormat()])
with mlflow.start_run():
model = SAC("MlpPolicy", env, verbose=2)
model.set_logger(loggers)
model.learn(total_timesteps=10000, log_interval=1)
I will provide a PR.
@EloyAnguiano please have a look into PR #889
Related https://github.com/hill-a/stable-baselines/issues/1128#issuecomment-1124794750 (implementation by @tim99oth99e using tensorboard package directly and SB3 callback)
It is also in the documentation: https://stable-baselines3.readthedocs.io/en/master/guide/tensorboard.html#directly-accessing-the-summary-writer.
For non-advanced users, it might not be so clear that hyperparameters can be logged in this way (and it is not recommended). I think that it could be useful to add a Logging Hyperparameters section to the tensorboard integration documentation.
For this, we could define a new data class Hparam
in logger.py and add support for this class in the write
method of TensorBoardOutputFormat
class (in the same way as Video
or Figure
).
e.g.
if isinstance(value, Hparam):
self.writer.add_hparam(key, value.hparms, value.metrics)
Then add a section Logging Hyperparameters to https://stable-baselines3.readthedocs.io/en/master/guide/tensorboard.html, with an example to use it (in the same way as Logging Images, for example).
I'd be pleased to implement this.
@tim99oth99e
Please go ahead ;) (don't forget to read the contributing guide and the PR checklist)
@EloyAnguiano have a look:
import sys from typing import Dict, Any, Union, Tuple import gym import mlflow import numpy as np from stable_baselines3 import SAC from stable_baselines3.common.logger import KVWriter, Logger, HumanOutputFormat class MLflowOutputFormat(KVWriter): def write(self, key_values: Dict[str, Any], key_excluded: Dict[str, Union[str, Tuple[str, ...]]], step: int = 0) -> None: for (key, value), (_, excluded) in zip(sorted(key_values.items()), sorted(key_excluded.items())): if excluded is not None and "mlflow" in excluded: continue if isinstance(value, np.ScalarType): if not isinstance(value, str): mlflow.log_metric(key, value, step) env = gym.make("Pendulum-v1") loggers = Logger(folder=None, output_formats=[HumanOutputFormat(sys.stdout), MLflowOutputFormat()]) with mlflow.start_run(): model = SAC("MlpPolicy", env, verbose=2) model.set_logger(loggers) model.learn(total_timesteps=10000, log_interval=1)
I will provide a PR.
Hi guys, could you tell me what is the best practice between using a Logger such as KVWriter vs. using callbacks ? When should we use one with respect to the other ? Thanks.
Hi guys, could you tell me what is the best practice between using a Logger such as KVWriter vs. using callbacks ?
Use the callbacks whenever possible. KVWriter
is only to add a new output format (for instance tensoboard, MLFlow, writing to a file, ...). At the end the logger will be used.