stable-baselines3 Also log hyperparameters to the tensorboard

🚀 Feature

It would be nice if the used hyperparameters would be included in the tensorboard log for better comparability.

Motivation

Manually comparing the hyperparameters in a separate location isn't as clean and tensorboard supports it. This would use the full potential.

Additional context

https://pytorch.org/docs/stable/tensorboard.html#torch.utils.tensorboard.writer.SummaryWriter.add_hparams

### Checklist

[x] I have checked that there is no similar issue in the repo (required)

May 11 '21 13:05 Flova

Definitely sounds like an useful feature. The largest part here would be to figure out where these hyperparameters should be logged and what should be fed in. This sounds like it should include stuff from algorithm level (e.g. PPO, SAC) and policy-class level (OnPolicy and OffPolicyAlgorithm). Sounds like we need a get_hyperparameters function for each algorithm that returns a dictionary of hyperparameters and their values.

@araffin comments?

May 11 '21 14:05 Miffyli

Hello, you forgot to fill-in the "alternatives" section ;)

As an alternative, you can easily define a callback since https://github.com/DLR-RM/stable-baselines3/issues/286 was merged. It is also in the documentation: https://stable-baselines3.readthedocs.io/en/master/guide/tensorboard.html#directly-accessing-the-summary-writer.

Sounds like we need a get_hyperparameters function for each algorithm that returns a dictionary of hyperparameters and their values.

Well, this is the part I'm not a big fan of...

May 11 '21 14:05 araffin

It is also in the documentation: https://stable-baselines3.readthedocs.io/en/master/guide/tensorboard.html#directly-accessing-the-summary-writer.

Ah this sounds like a good approach to this, so users can do their own logging.

Well, this is the part I'm not a big fan of...

Yeeeah this would get very long and error-prone (something is updated/added but then not included in this list, etc), so I would also rather avoid it if possible ^^

May 11 '21 14:05 Miffyli

If I understand you correctly you suggest that I register a callback myself with the hyperparameters I need?

For the context. I use the tools provided by the zoo repo to run e.g. PPO in my env from the command line. They run fine out of the box and it would be nice if e.g. the PPO hyperparameters as defined in my yaml are logged without further interaction with the code. Other values like loss, reward, ... are also logged without further interaction. I think logging crucial hyperparameters like epochs, learning_rate, batch_size are on a similar level as logging the other standard metrics during the training. Adding fancy stuff by hand using this callback is cool, but I think this is more basic. But maybe this is more of an issue for the zoo repo and the callback should be implemented there?

Sry if there are misconceptions on my side, I am not too familiar with the exact internal structure of this repo. ^^

May 11 '21 16:05 Flova

Hmm you raise a good point! I have not used the newest version of zoo, but at least in the past it did not log all the parameters anywhere (you only have the .yml file that updates parameters). Sounds like this could be an enhancement for zoo @araffin ?

May 11 '21 21:05 Miffyli

If I understand you correctly you suggest that I register a callback myself with the hyperparameters I need?

Yes, adding callbacks is in fact included in the rl zoo (I just updated the doc today): https://github.com/DLR-RM/rl-baselines3-zoo#callbacks

They run fine out of the box and it would be nice if e.g. the PPO hyperparameters as defined in my yaml are logged without further interaction with the code.

the one defined in the yaml file are saved but not sent to tensorboard.

Adding fancy stuff by hand using this callback is cool, but I think this is more basic.

Adding a callback is quite basic (I think it should be only ~10 lines of code from what you are describing).

May 12 '21 13:05 araffin

As a follow-up, you can take a look at what is done in the wandb callback: https://github.com/wandb/client/blob/master/wandb/integration/sb3/sb3.py (where all hyperparameters are saved)

Documentation: https://gitbook-docs.wandb.ai/guides/integrations/other/stable-baselines-3

Sep 13 '21 09:09 araffin

Are there any plans of doing some integration with MLFlow too?

Sep 15 '21 13:09 EloyAnguiano

@EloyAnguiano have a look:

import sys

from typing import Dict, Any, Union, Tuple

import gym
import mlflow
import numpy as np

from stable_baselines3 import SAC
from stable_baselines3.common.logger import KVWriter, Logger, HumanOutputFormat


class MLflowOutputFormat(KVWriter):

    def write(self, key_values: Dict[str, Any], key_excluded: Dict[str, Union[str, Tuple[str, ...]]], step: int = 0) -> None:

        for (key, value), (_, excluded) in zip(sorted(key_values.items()), sorted(key_excluded.items())):

            if excluded is not None and "mlflow" in excluded:
                continue

            if isinstance(value, np.ScalarType):
                if not isinstance(value, str):
                    mlflow.log_metric(key, value, step)


env = gym.make("Pendulum-v1")
loggers = Logger(folder=None, output_formats=[HumanOutputFormat(sys.stdout), MLflowOutputFormat()])

with mlflow.start_run():
    model = SAC("MlpPolicy", env, verbose=2)
    model.set_logger(loggers)
    model.learn(total_timesteps=10000, log_interval=1)

I will provide a PR.

Apr 26 '22 22:04 git-thor

@EloyAnguiano please have a look into PR #889

Apr 26 '22 23:04 git-thor

Related https://github.com/hill-a/stable-baselines/issues/1128#issuecomment-1124794750 (implementation by @tim99oth99e using tensorboard package directly and SB3 callback)

May 12 '22 10:05 araffin

It is also in the documentation: https://stable-baselines3.readthedocs.io/en/master/guide/tensorboard.html#directly-accessing-the-summary-writer.

For non-advanced users, it might not be so clear that hyperparameters can be logged in this way (and it is not recommended). I think that it could be useful to add a Logging Hyperparameters section to the tensorboard integration documentation.

For this, we could define a new data class Hparam in logger.py and add support for this class in the write method of TensorBoardOutputFormat class (in the same way as Video or Figure).
e.g.

if isinstance(value, Hparam):
    self.writer.add_hparam(key, value.hparms, value.metrics)

Then add a section Logging Hyperparameters to https://stable-baselines3.readthedocs.io/en/master/guide/tensorboard.html, with an example to use it (in the same way as Logging Images, for example).

I'd be pleased to implement this.

May 13 '22 10:05 timothe-chaumont

@tim99oth99e

Please go ahead ;) (don't forget to read the contributing guide and the PR checklist)

May 16 '22 10:05 araffin

@EloyAnguiano have a look:

import sys

from typing import Dict, Any, Union, Tuple

import gym
import mlflow
import numpy as np

from stable_baselines3 import SAC
from stable_baselines3.common.logger import KVWriter, Logger, HumanOutputFormat


class MLflowOutputFormat(KVWriter):

    def write(self, key_values: Dict[str, Any], key_excluded: Dict[str, Union[str, Tuple[str, ...]]], step: int = 0) -> None:

        for (key, value), (_, excluded) in zip(sorted(key_values.items()), sorted(key_excluded.items())):

            if excluded is not None and "mlflow" in excluded:
                continue

            if isinstance(value, np.ScalarType):
                if not isinstance(value, str):
                    mlflow.log_metric(key, value, step)


env = gym.make("Pendulum-v1")
loggers = Logger(folder=None, output_formats=[HumanOutputFormat(sys.stdout), MLflowOutputFormat()])

with mlflow.start_run():
    model = SAC("MlpPolicy", env, verbose=2)
    model.set_logger(loggers)
    model.learn(total_timesteps=10000, log_interval=1)

I will provide a PR.

Hi guys, could you tell me what is the best practice between using a Logger such as KVWriter vs. using callbacks ? When should we use one with respect to the other ? Thanks.

Aug 03 '22 13:08 ReHoss

Hi guys, could you tell me what is the best practice between using a Logger such as KVWriter vs. using callbacks ?

Use the callbacks whenever possible. KVWriter is only to add a new output format (for instance tensoboard, MLFlow, writing to a file, ...). At the end the logger will be used.

Aug 03 '22 13:08 araffin

stable-baselines3 stable-baselines3 copied to clipboard

Also log hyperparameters to the tensorboard

🚀 Feature

Motivation

Additional context

stable-baselines3
stable-baselines3 copied to clipboard