tensorboard
tensorboard copied to clipboard
Tensorboard metrics not loaded properly
I have the problem that in Tensorboard the metrics are not loaded correctly (the column is always empty), although the scalars are saved correctly. I am working with torch.utils.tensorboard.
Relevant code:
writer = SummaryWriter(log_dir=f'./logs/studies/{study_name}/')
In the training loop:
writer.add_scalar(tag='validation/min_loss', scalar_value=min_val_loss, global_step=trial.number)
Add the hyperparameter to the summary writer (args_dict is a dictionary with all hyperparameters)
writer.add_hparams(hparam_dict=args_dict, metric_dict={'validation/min_loss': min_val_loss}, run_name=run_name)
writer.close()
Are the metrics showing up in the Time Series or Scalar tabs? Did you try selecting the "show metrics" check boxes?
The scalars associated with the metrics are loaded correctly in both the TIME SERIES and SCALARS tabs. The only problem is that no metrics are displayed in the HPARAMS tab. When I select the "show metrics" checkboxes, a completely empty chart pops up.
Wow that is strange! I do not see why that would happen and I cannot seem to reproduce it. Is this happening with other logs or just this one?
Yes, it's weird. It doesn't seem to be a problem only with these specific logs. I've also used other scalars as metrics, but that didn't change the result. It is perhaps also noteworthy that I encountered exactly the same problem with a completely different implementation, namely the code from the Official Guide to Hyperparameter Optimization with tensorboard (this is a tensorflow implementation). The scalars were displayed correctly in the TIME SERIES and SCALARS tab, but the column of the corresponding metric „Accuracy“ in the HPARAMS tab remained empty.
Related code (from the official guide):
import tensorflow as tf
from tensorboard.plugins.hparams import api as hp
fashion_mnist = tf.keras.datasets.fashion_mnist
(x_train, y_train),(x_test, y_test) = fashion_mnist.load_data()
x_train, x_test = x_train / 255.0, x_test / 255.0
HP_NUM_UNITS = hp.HParam('num_units', hp.Discrete([16, 32]))
HP_DROPOUT = hp.HParam('dropout', hp.RealInterval(0.1, 0.2))
HP_OPTIMIZER = hp.HParam('optimizer', hp.Discrete(['adam', 'sgd']))
METRIC_ACCURACY = 'accuracy'
with tf.summary.create_file_writer('logs/hparam_tuning').as_default():
hp.hparams_config(
hparams=[HP_NUM_UNITS, HP_DROPOUT, HP_OPTIMIZER],
metrics=[hp.Metric(METRIC_ACCURACY, display_name='Accuracy')],
)
def train_test_model(hparams):
model = tf.keras.models.Sequential([
tf.keras.layers.Flatten(),
tf.keras.layers.Dense(hparams[HP_NUM_UNITS], activation=tf.nn.relu),
tf.keras.layers.Dropout(hparams[HP_DROPOUT]),
tf.keras.layers.Dense(10, activation=tf.nn.softmax),
])
model.compile(
optimizer=hparams[HP_OPTIMIZER],
loss='sparse_categorical_crossentropy',
metrics=['accuracy'],
)
model.fit(x_train, y_train, epochs=1) # Run with 1 epoch to speed things up for demo purposes
_, accuracy = model.evaluate(x_test, y_test)
return accuracy
def run(run_dir, hparams):
with tf.summary.create_file_writer(run_dir).as_default():
hp.hparams(hparams) # record the values used in this trial
accuracy = train_test_model(hparams)
tf.summary.scalar(METRIC_ACCURACY, accuracy, step=1)
session_num = 0
for num_units in HP_NUM_UNITS.domain.values:
for dropout_rate in (HP_DROPOUT.domain.min_value, HP_DROPOUT.domain.max_value):
for optimizer in HP_OPTIMIZER.domain.values:
hparams = {
HP_NUM_UNITS: num_units,
HP_DROPOUT: dropout_rate,
HP_OPTIMIZER: optimizer,
}
run_name = "run-%d" % session_num
print('--- Starting trial: %s' % run_name)
print({h.name: hparams[h] for h in hparams})
run('logs/hparam_tuning/' + run_name, hparams)
session_num += 1
Is it possible for you to send me your log files?
Sure. But since I'm currently on vacation, I can't do this until the beginning of next week.
Hey Tim, thanks for sending me your logs. Unfortunately, I still cannot reproduce the issue. I ran these commands:
pip install --upgrade pip pip install tensorboard tensorboard --logdir ./your/log/dir
What version of TensorBoard are you running?
pip freeze | grep tensorboard tensorboard==2.8.0 tensorboard-data-server==0.6.1
Hey James, the tensorboard versions were indeed the deciding factor. I had the newer versions
tensorboard 2.17.1 tensorboard-data-server 0.7.2
installed. Downgrading to
tensorboard 2.8.0 tensorboard-data-server 0.6.1
solved the problem and all metrics were displayed correctly. Thank you very much for your help! One more note: I also installed the today released version
tensorboard 2.18.0
and
tensorboard-data-server 0.7.2
However, the problem still exists for these.
Thanks! Had the same issue here.
Works for me with tensorboard==2.16.2 (and tensorboard-data-server==0.7.2).
Can this issue be re-opened? The issue still persists with the current version 2.18.0 Works for me in 2.16.2, not in 2.17.0
I'm also having the same issue with tensorboard==2.18.0 and tensorboard-data-server==0.7.2.
I have the same issue in tensorboard==2.19.0 Is anyone working on the fix?
Are people experiencing this, writing their data with a library different from TensorFlow? (e.g. PyTorch?)
I ran the script provided above, and ran it with TB 2.19.0, and I can see the Accuracy metric being displayed as expected.
Since somebody reported this was reproducible starting on 2.17.0, I was suspecting #6822 (from the 2.17 release notes) could have something to do with this, depending on whether the files with the data are directly in the logdir specified in the CLI for the TB command, but I was not able to reproduce.
I still experience this on TB 2.19.0 I write my data like this.
from torch.utils.tensorboard import SummaryWriter
w = SummaryWriter("hparams/results")
w.add_hparams(
{"lr": 1e-3, "batch_size": 4096},
{"episode_return_mean": 123.4}
)
w.close()
using torch==2.8.0 and TB==2.19.0 i do not get any values in my episode_return_mean column in the hparams tab of tensorboard. Downgrading TB to 2.16.2 fixes it. But then I must downgrade protobuf aswell. Which I cannot do due to other dependencies.
I copy all codes from hparams demo colab And run my TB 2.19 than encounter this problem But it works fine on colab (TB 2.18)
I encountered the same issue on TB 2.19.0. I tried several versions: TB 2.19.0 — Display issues TB 2.18.0 — Display issues TB 2.17.1 — Display issues TB 2.17.0 — Display issues TB 2.16.2 — Normal display Downgrading Thunderbird to 2.16.2 resolved the issue. At least TB is functional now. I’m not sure how to fix or further troubleshoot this, but please let me know if you need more information.
I am also experiencing this same issue. My environment is python 3.12 through miniconda on Windows 10 Pro 19045, with Tensorboard 2.19.0. Note that I do not have any issue when running Tensorboard from Debian 12, also with Tensorboard with 2.19.0. Even more surprising, mounting my windows directory into WSL Ubuntu 22 and running tensorboard from there also works.
At least in my case, this therefore seems to be a windows specific issue.
This seems to be an issue with purely visualization, not logging, because I can log the data with Windows python and visualize without issue through WSL when mounting the Windows-generated tensorboard logs. This behavior was tested with both tensorflow logging and pytorch torch.utils.tensorboard logging.
@timr1101 @lebeand @kevinunger @dlindmark @0523ronli @XuRainbow Can you confirm if you were using Windows when encountering the issue?