tensorboard icon indicating copy to clipboard operation
tensorboard copied to clipboard

hparams table not getting displayed when many hparams are beeing used

Open asorie opened this issue 5 years ago • 15 comments

Consider Stack Overflow for getting support using TensorBoard—they have a larger community with better searchability:

https://stackoverflow.com/questions/tagged/tensorboard

Do not use this template for for setup, installation, or configuration issues. Instead, use the “installation problem” issue template:

https://github.com/tensorflow/tensorboard/issues/new?template=installation_problem.md

To report a problem with TensorBoard itself, please fill out the remainder of this template.

Environment information (required)

Diagnostics output
--- check: autoidentify
INFO: diagnose_tensorboard.py version 4725c70c7ed724e2d1b9ba5618d7c30b957ee8a4

--- check: general
INFO: sys.version_info: sys.version_info(major=3, minor=7, micro=3, releaselevel='final', serial=0)
INFO: os.name: nt
INFO: os.uname(): N/A
INFO: sys.getwindowsversion(): sys.getwindowsversion(major=10, minor=0, build=14393, platform=2, service_pack='')

--- check: package_management
INFO: has conda-meta: False
INFO: $VIRTUAL_ENV: 'C:\\tensorflow_anduin'

--- check: installed_packages
INFO: installed: tensorboard==2.0.0
INFO: installed: tensorflow-gpu==2.0.0
INFO: installed: tensorflow-estimator==2.0.0

--- check: tensorboard_python_version
INFO: tensorboard.version.VERSION: '2.0.0'

--- check: tensorflow_python_version
2019-10-08 14:40:42.620638: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cudart64_100.dll
INFO: tensorflow.__version__: '2.0.0'
INFO: tensorflow.__git_version__: 'v2.0.0-rc2-26-g64c3d382ca'

--- check: tensorboard_binary_path
INFO: which tensorboard: b'C:\\tensorflow_anduin\\Scripts\\tensorboard.exe\r\n'

--- check: readable_fqdn
INFO: socket.getfqdn(): '...'

--- check: stat_tensorboardinfo
INFO: directory: C:\Users\halle\AppData\Local\Temp\.tensorboard-info
INFO: os.stat(...): os.stat_result(st_mode=16895, st_ino=3096224744103339, st_dev=2217911477, st_nlink=1, st_uid=0, st_gid=0, st_size=0, st_atime=1570538160, st_mtime=1570538160, st_ctime=1562760637)
INFO: mode: 0o40777

--- check: source_trees_without_genfiles
INFO: tensorboard_roots (1): ['C:\\tensorflow_anduin\\lib\\site-packages']; bad_roots (0): []

--- check: full_pip_freeze
INFO: pip freeze --all:
absl-py==0.7.1
adal==1.2.2
asn1crypto==0.24.0
astor==0.8.0
astroid==2.2.5
avro-python3==1.9.1
azure-common==1.1.23
azure-graphrbac==0.53.0
azure-keyvault==1.1.0
azure-mgmt-authorization==0.51.1
azure-mgmt-containerregistry==2.7.0
azure-mgmt-keyvault==1.1.0
azure-mgmt-msi==0.2.0
azure-mgmt-nspkg==3.0.2
azure-mgmt-resource==2.2.0
azure-mgmt-storage==3.1.1
azure-nspkg==3.0.2
azure-storage-blob==1.5.0
azure-storage-common==1.4.2
blinker==1.4
boto3==1.9.238
botocore==1.12.238
cachetools==3.1.1
certifi==2019.9.11
cffi==1.12.3
chardet==3.0.4
Click==7.0
click-completion==0.5.1
clipboard==0.0.4
colorama==0.3.9
cryptography==2.7
cycler==0.10.0
docker==3.7.3
docker-pycreds==0.4.0
docutils==0.15.2
Flask==1.1.1
flatten-json==0.1.7
gast==0.2.2
gitdb2==2.0.6
GitPython==2.1.14
google-api-core==1.14.2
google-auth==1.6.3
google-cloud-core==1.0.3
google-cloud-kms==1.2.1
google-cloud-storage==1.20.0
google-pasta==0.1.7
google-resumable-media==0.4.1
googleapis-common-protos==1.6.0
grpc-google-iam-v1==0.12.3
grpcio==1.22.0
h5py==2.9.0
httplib2==0.14.0
humanize==0.5.1
idna==2.8
imageio==2.5.0
isodate==0.6.0
isort==4.3.21
itsdangerous==1.1.0
Jinja2==2.10.1
jmespath==0.9.4
Keras==2.2.4
Keras-Applications==1.0.8
Keras-Preprocessing==1.1.0
kiwisolver==1.1.0
lazy-object-proxy==1.4.1
lockfile==0.12.2
Markdown==3.1.1
MarkupSafe==1.1.1
matplotlib==3.1.1
mccabe==0.6.1
missinglink==19.9.26557
missinglink-kernel==19.9.26893
missinglink-sdk==19.9.26893
ml-core==19.9.3999
ml-crypto==0.7.811
ml-legit==19.9.8734
msgpack==0.6.2
msrest==0.6.10
msrestazure==0.6.2
mypy==0.711
mypy-extensions==0.4.1
natsort==6.0.0
netifaces==0.10.9
numpy==1.17.2
oauthlib==3.1.0
opt-einsum==2.3.2
pandas==0.25.1
patsy==0.5.1
pep8==1.7.1
Pillow==6.1.0
pip==19.2.3
ply==3.11
protobuf==3.8.0
psutil==5.6.3
puremagic==1.5
pyasn1==0.4.7
pyasn1-modules==0.2.6
pycparser==2.19
pycryptodome==3.6.6
Pygments==2.4.2
PyJWT==1.7.1
pylint==2.3.1
pyparsing==2.4.0
pyperclip==1.7.0
pypiwin32==223
python-dateutil==2.8.0
pytz==2019.2
pywin32==225
PyYAML==5.1.1
requests==2.22.0
requests-oauthlib==1.2.0
retrying==1.3.3
rope==0.14.0
rsa==4.0
s3transfer==0.2.1
scipy==1.3.0
sentry-sdk==0.11.2
setuptools==41.0.1
shellingham==1.3.1
six==1.12.0
smmap2==2.0.5
sseclient==0.0.24
statsmodels==0.10.1
tensorboard==2.0.0
tensorflow-estimator==2.0.0
tensorflow-gpu==2.0.0
termcolor==1.1.0
terminaltables==3.1.0
tqdm==4.32.2
typed-ast==1.4.0
urllib3==1.24.3
wcwidth==0.1.7
websocket-client==0.56.0
Werkzeug==0.16.0
wheel==0.33.4
wrapt==1.11.2

  • Browser: Chrome 76.0.3809.132

Issue description

If I use many hparams (eg. 14) in tensorboard the table doenst display any results but the table head gets displayed correcly. image

But when I delete some of the rows in HPARAMS section the row in the hparams table and the accuracy gets displayed correcly.

HPARAMS = [HP_BATCH_SIZE,
                HP_OPTIMIZER,
                HP_W_PARAM_0,
                HP_W_PARAM_1,
                HP_W_PARAM_2,
                HP_W_PARAM_3,
                HP_CONV1_FILTER,
                HP_CONV1_KERNEL,
                HP_CONV2_FILTER,
                HP_CONV2_KERNEL,
                HP_CONV3_FILTER,
                HP_CONV3_KERNEL,
                HP_Conv_UP_1_UNITS,
                HP_Conv_UP_2_UNITS]

with file_writer.as_default():
        hp.hparams_config(
            hparams=HPARAMS,
            metrics=METRICS,
        )
        hp.hparams(hparams)

asorie avatar Oct 08 '19 12:10 asorie

@Asorie Can you please try running this script in your notebook and let me know if you are facing the same issue?

gowthamkpr avatar Oct 08 '19 18:10 gowthamkpr

I tried the script , but at section 4. the line %tensorboard --logdir logs/hparam_tuning procudes an error:

ERROR: Failed to launch TensorBoard (exited with 1).
Contents of stderr:
Traceback (most recent call last):
  File "/usr/local/bin/tensorboard", line 10, in <module>
    sys.exit(run_main())
  File "/usr/local/lib/python3.6/dist-packages/tensorboard/main.py", line 64, in run_main
    app.run(tensorboard.main, flags_parser=tensorboard.configure)
  File "/usr/local/lib/python3.6/dist-packages/absl/app.py", line 299, in run
    _run_main(main, args)
  File "/usr/local/lib/python3.6/dist-packages/absl/app.py", line 250, in _run_main
    sys.exit(main(argv))
  File "/usr/local/lib/python3.6/dist-packages/tensorboard/program.py", line 220, in main
    server = self._make_server()
  File "/usr/local/lib/python3.6/dist-packages/tensorboard/program.py", line 299, in _make_server
    self.assets_zip_provider)
  File "/usr/local/lib/python3.6/dist-packages/tensorboard/backend/application.py", line 160, in standard_tensorboard_wsgi
    flags, plugin_loaders, data_provider, assets_zip_provider, multiplexer)
  File "/usr/local/lib/python3.6/dist-packages/tensorboard/backend/application.py", line 228, in TensorBoardWSGIApp
    return TensorBoardWSGI(tbplugins, flags.path_prefix)
  File "/usr/local/lib/python3.6/dist-packages/tensorboard/backend/application.py", line 279, in __init__
    raise ValueError('Duplicate plugins for name %s' % plugin.plugin_name)
ValueError: Duplicate plugins for name projector

asorie avatar Oct 09 '19 07:10 asorie

Its because there might be multiple versions of Tensorboard in your system. Please find my github gist here

I am able to see all the hyperparameters on Tensorboard using Tensorflow 2.0. There might be an issue with your tensorboard. Please try to run the same script in your system and see if you can see hparams displayed or no. Thanks!

gowthamkpr avatar Oct 09 '19 18:10 gowthamkpr

The script works. I think the problem is, that I tried to add new HP and write the logs to an already used tensorboard.

asorie avatar Oct 10 '19 06:10 asorie

Yes. So, I think the problem here is resolved?

gowthamkpr avatar Oct 10 '19 17:10 gowthamkpr

Not really. I think tensorboard should look for the HP used and add new to the table if a new HP was found.

asorie avatar Oct 14 '19 09:10 asorie

If this line: HP_OPTIMIZER = hp.HParam('optimizer', hp.Discrete(['adam', 'sgd'])) gets changed to: HP_OPTIMIZER = hp.HParam('optimizer', hp.Discrete(['adam', 'sgd', 'RMSprop'])) and then trained to the same logdir, tensorboard doenst add this new model to the hparams table.

So it isn't possible to dynamically change the possible hparams in the same logdir?

asorie avatar Oct 15 '19 06:10 asorie

I think I'm facing the same issue. Any updates here?

Yannick947 avatar Apr 09 '20 09:04 Yannick947

This issue, in particular, https://github.com/tensorflow/tensorboard/issues/2743#issuecomment-542057891, very much reminds me of #3597. There, the problem is that mixed-type (string + float, meaning some models use a string value, others a numerical value) parameters are all cast to string, but the filter in list_session_groups.py doesn't take that casting into account - it looks for 2.0 and doesn't find "2.0". As a result, only models with string parameter values are found - the other ones just don't show up. I have never used hp.HParam myself, so I cannot say if the two HP_OPTIMIZERs are seen as different types, but it sure feels like a similar issue.

bersbersbers avatar May 08 '20 16:05 bersbersbers

I'm having the same issue, I am using torch + PPO in rllib and only half of my hyperparams show on tensorboard

NumberChiffre avatar Jun 04 '20 06:06 NumberChiffre

I've had the same issue with TensorboardX, the reason was that the metric name contained a whitespace.

vrublack avatar Oct 19 '20 09:10 vrublack

I met the same issue, when the number of hparams is getting large the issue appears.

ghost avatar Jun 28 '22 09:06 ghost

Still an issue for me. Really annoying. Anyone have a solution?

I guess I will try to write Hparams structure to other file and replace that every time I change something. Not sure this works though

pinkponk avatar Oct 16 '23 09:10 pinkponk

The original issue description here suggests the issue appears when "many hparams" are used. Then later it seems to be that users are trying to "add new HP and write the logs to an already used tensorboard".

So I'm not sure I'm understanding what the issue is. Are you logging more hparams data to the same log dir, and you want TB to read it? Does starting tensorboard again like tensorboard --logdir path/to/logs show everything you want to see? Do you have a small example to reproduce the issue?

arcra avatar Oct 24 '23 21:10 arcra

@arcra it probably covers only one aspect of this issue, but https://github.com/tensorflow/tensorboard/issues/3597#issuecomment-1490793918 has very specific repro steps that I created "only" 7 months ago ("only" compared to the 4 years that this issue has been open).

bersbersbers avatar Oct 25 '23 03:10 bersbersbers