stable-baselines3 [Bug]: Missing metrics when logging hyperparameters on tensorboard

🐛 Bug

When I try to log metrics related to some hyperparameters on tensorboard, the values of metrics are not stored.

To Reproduce

from stable_baselines3.common.logger import configure, HParam

tmp_path = "log_bug"
# set up logger
new_logger = configure(tmp_path, ["tensorboard"])
hp = HParam({"hparam": 1.0}, {"missing_metric": 2.0})

new_logger.record("hparams", hp)
new_logger.dump()

Relevant log output / Error message

No response

System Info

OS: Linux-5.15.79.1-microsoft-standard-WSL2-x86_64-with-glibc2.29 #1 SMP Wed Nov 23 01:01:46 UTC 2022
Python: 3.8.10
Stable-Baselines3: 1.6.2
PyTorch: 1.13.1+cu117
GPU Enabled: False
Numpy: 1.24.1
Gym: 0.21.0

Checklist

[X] I have checked that there is no similar issue in the repo
[X] I have read the documentation
[X] I have provided a minimal working example to reproduce the bug
[X] I've used the markdown code blocks for both code and stack traces.

Jan 26 '23 11:01 rogierz

I have the same issue when trying to log hparams metrics

Jan 26 '23 11:01 riccardosepe

@timothe-chaumont as you did the implementation in https://github.com/DLR-RM/stable-baselines3/pull/984, could you have a look?

new_logger.dump()

I would expect dump(num_timesteps) there

Jan 26 '23 11:01 araffin

You are right @rogierz, metric values that are passed to HParam through the metric_dict won't be saved. They are supposed to reference metrics that have been logged separately (otherwise they won't be displayed in HPARAMS).

In the documentation, the example mentions:

# define the metrics that will appear in the `HPARAMS` Tensorboard tab by referencing their tag
# Tensorbaord will find & display metrics from the `SCALARS` tab
metric_dict = {
    "rollout/ep_len_mean": 0,
    "train/value_loss": 0.0,
}

So in your example you would need to log your custom metric with

new_logger.record("missing_metric", 2.0)

so that, when referenced in HParam, TensorBoard will find it and add it to the HPARAMS tab:

Screenshot 2023-02-20 at 00 35 21

Feb 20 '23 00:02 timothe-chaumont

IMHO I might have an idea where this bug comes from: E.g. in HumanOutputFormat one iterates over all keys to output during logger.dump. But the iteration happens together with key_excluded. It is to note that nowhere is it ensured that a loggers.name_to_value and name_to_exclude are the same length (they are public and one might want to use them to do more specific logging then possible just with log_value/mean)..

I would suggest the following patch @ https://github.com/DLR-RM/stable-baselines3/blob/a9273f968eaf8c6e04302a07d803eebfca6e7e86/stable_baselines3/common/logger.py#L179:

- for (key, value), (_, excluded) in zip(sorted(key_values.items()), sorted(key_excluded.items())):
+ for (key, value) in sorted(key_values.items():
+     excluded = key_excluded.get(key, ('',))

Similar changes are needed, e.g., for the TensorboardOutputFormat @ https://github.com/DLR-RM/stable-baselines3/blob/a9273f968eaf8c6e04302a07d803eebfca6e7e86/stable_baselines3/common/logger.py#L404

Jan 15 '24 12:01 kingjin94