tensorboard
tensorboard copied to clipboard
Mutually incompatible hparams sets
Diagnostics
Diagnostics output
--- check: autoidentify
INFO: diagnose_tensorboard.py version 724b56cee52e7d8eb89bbeec1f0d5ce3e38c9682
--- check: general
INFO: sys.version_info: sys.version_info(major=3, minor=8, micro=2, releaselevel='final', serial=0)
INFO: os.name: posix
INFO: os.uname(): posix.uname_result(sysname='Linux', nodename='redacted', release='4.12.14-lp151.28.32-default', version='#1 SMP Wed Nov 13 07:50:15 UTC 2019 (6e1aaad)', machine='x86_64')
INFO: sys.getwindowsversion(): N/A
--- check: package_management
INFO: has conda-meta: False
INFO: $VIRTUAL_ENV: None
--- check: installed_packages
INFO: installed: tensorboard==2.2.1
INFO: installed: tensorflow==2.2.0rc4
INFO: installed: tensorflow-estimator==2.2.0
--- check: tensorboard_python_version
INFO: tensorboard.version.VERSION: '2.2.1'
--- check: tensorflow_python_version
INFO: tensorflow.__version__: '2.2.0-rc4'
INFO: tensorflow.__git_version__: 'v2.2.0-rc3-33-g70087ab4f4'
--- check: tensorboard_binary_path
INFO: which tensorboard: b'/home/redacted/.pyenv/versions/3.8.2/bin/tensorboard\n'
--- check: addrinfos
socket.has_ipv6 = True
socket.AF_UNSPEC = <AddressFamily.AF_UNSPEC: 0>
socket.SOCK_STREAM = <SocketKind.SOCK_STREAM: 1>
socket.AI_ADDRCONFIG = <AddressInfo.AI_ADDRCONFIG: 32>
socket.AI_PASSIVE = <AddressInfo.AI_PASSIVE: 1>
Loopback flags: <AddressInfo.AI_ADDRCONFIG: 32>
Loopback infos: [(<AddressFamily.AF_INET6: 10>, <SocketKind.SOCK_STREAM: 1>, 6, '', ('::1', 0, 0, 0)), (<AddressFamily.AF_INET: 2>, <SocketKind.SOCK_STREAM: 1>, 6, '', ('127.0.0.1', 0))]
Wildcard flags: <AddressInfo.AI_PASSIVE: 1>
Wildcard infos: [(<AddressFamily.AF_INET: 2>, <SocketKind.SOCK_STREAM: 1>, 6, '', ('0.0.0.0', 0)), (<AddressFamily.AF_INET6: 10>, <SocketKind.SOCK_STREAM: 1>, 6, '', ('::', 0, 0, 0))]
--- check: readable_fqdn
INFO: socket.getfqdn(): 'redacted'
--- check: stat_tensorboardinfo
INFO: directory: /tmp/.tensorboard-info
INFO: os.stat(...): os.stat_result(st_mode=16895, st_ino=3407924, st_dev=66306, st_nlink=2, st_uid=145278, st_gid=890, st_size=4096, st_atime=1588742078, st_mtime=1588839093, st_ctime=1588839093)
INFO: mode: 0o40777
--- check: source_trees_without_genfiles
INFO: tensorboard_roots (1): ['/home/redacted/.pyenv/versions/3.8.2/lib/python3.8/site-packages']; bad_roots (0): []
--- check: full_pip_freeze
INFO: pip freeze --all:
absl-py==0.9.0
alabaster==0.7.12
appdirs==1.4.3
argcomplete==1.10.0
astroid==2.4.1
astunparse==1.6.3
attrs==19.3.0
Babel==2.8.0
backcall==0.1.0
bandit==1.6.2
beautifulsoup4==4.8.0
black==19.10b0
blessed==1.17.5
cachetools==4.1.0
cachey==0.2.1
certifi==2020.4.5.1
chardet==3.0.4
click==7.1.2
colour-science==0.3.15
cycler==0.10.0
dask==2.15.0
decorator==4.4.2
docopt==0.6.2
docutils==0.16
docx2txt==0.8
EbookLib==0.17.1
entrypoints==0.3
et-xmlfile==1.0.1
ExifRead==2.1.2
extract-msg==0.23.1
flake8==3.7.9
flyingcircus==0.1.2.1
flyingcircus-numeric==0.1.1.1
freetype-py==2.1.0.post1
fsspec==0.7.3
gast==0.3.3
gitdb==4.0.5
GitPython==3.1.2
google-auth==1.14.2
google-auth-oauthlib==0.4.1
google-pasta==0.2.0
grpcio==1.28.1
gviz-api==1.9.0
h5py==2.10.0
HeapDict==1.0.1
humanize==2.4.0
idna==2.9
imagecodecs==2020.2.18
imageio==2.8.0
imagesize==1.2.0
IMAPClient==2.1.0
ipykernel==5.2.1
ipython==7.14.0
ipython-genutils==0.2.0
isort==4.3.21
jdcal==1.4.1
jedi==0.17.0
Jinja2==2.11.2
joblib==0.14.1
json-spec==0.10.1
json5==0.9.4
jsoncomment==0.4.2
jstyleson==0.0.2
jupyter-client==6.1.3
jupyter-core==4.6.3
Keras==2.3.1
Keras-Applications==1.0.8
Keras-Preprocessing @ file:///home/redacted/pip_patches/keras-preprocessing
keras-vis==0.4.1
kiwisolver==1.2.0
lazy-object-proxy==1.4.3
lxml==4.5.0
Markdown==3.2.1
MarkupSafe==1.1.1
matplotlib==3.2.1
mccabe==0.6.1
mypy==0.770
mypy-extensions==0.4.3
napari==0.3.0
napari-plugin-engine==0.1.4
napari-svg==0.1.2
natsort==7.0.1
networkx==2.4
nose==1.3.7
numexpr==2.7.1
numpy==1.18.4
numpydoc==0.9.2
oauthlib==3.1.0
olefile==0.46
opencv-contrib-python==4.2.0.34
opencv-contrib-python-headless==4.2.0.34
openpyxl==3.0.3
opt-einsum==3.2.1
ordered-set==4.0.1
packaging==20.3
pandas==1.0.3
parso==0.7.0
pathspec==0.8.0
pathvalidate==2.3.0
pbr==5.4.5
pdfminer.six==20181108
pexpect==4.8.0
pickleshare==0.7.5
Pillow==7.1.2
pip==20.1
pipdeptree==0.13.2
pipreqs==0.4.10
prompt-toolkit==3.0.5
protobuf==3.11.3
psutil==5.7.0
ptyprocess==0.6.0
pyasn1==0.4.8
pyasn1-modules==0.2.8
pycodestyle==2.5.0
pycryptodome==3.9.7
pydocstyle==5.0.2
pyflakes==2.2.0
Pygments==2.6.1
pylama==7.7.1
pylint==2.5.2
pylint-plugin-utils==0.6
pynrrd==0.4.2
PyOpenGL==3.1.5
pyparsing==2.4.7
PyPDF2==1.26.0
PyQt5==5.14.2
PyQt5-sip==12.7.2
PySide2==5.14.2.1
python-dateutil==2.8.1
python-pptx==0.6.18
pytz==2020.1
PyWavelets==1.1.1
PyYAML==5.3.1
pyzmq==19.0.0
qtconsole==4.7.3
QtPy==1.9.0
raster-geometry==0.1.4.1
regex==2020.4.4
requests==2.23.0
requests-oauthlib==1.3.0
rope==0.17.0
rsa==4.0
scikit-image==0.16.2
scikit-learn==0.22.2.post1
scikit-plot==0.3.7
scipy==1.4.1
seaborn==0.10.1
setuptools==46.1.3
setuptools-scm==3.5.0
SharedArray==3.2.1
shiboken2==5.14.2.1
silence-tensorflow==1.1.1
six==1.12.0
smmap==3.0.4
snowballstemmer==2.0.0
sortedcontainers==2.1.0
soupsieve==2.0
SpeechRecognition==3.8.1
Sphinx==3.0.3
sphinxcontrib-applehelp==1.0.2
sphinxcontrib-devhelp==1.0.2
sphinxcontrib-htmlhelp==1.0.3
sphinxcontrib-jsmath==1.0.1
sphinxcontrib-qthelp==1.0.3
sphinxcontrib-serializinghtml==1.1.4
stevedore==1.32.0
tensorboard==2.2.1
tensorboard-plugin-profile==2.2.0
tensorboard-plugin-wit==1.6.0.post3
tensorflow==2.2.0rc4
tensorflow-addons @ git+https://github.com/tensorflow/addons.git@ad132da23a8162eb97c435676dd7426e622a0074
tensorflow-determinism==0.3.0
tensorflow-estimator==2.2.0
termcolor==1.1.0
textract==1.6.3
tf-explain==0.2.1
tifffile==2020.5.5
toml==0.10.0
toolz==0.10.0
tornado==6.0.4
trackpy==0.4.2
traitlets==4.3.3
typed-ast==1.4.1
typeguard==2.7.1
typing-extensions==3.7.4.2
tzlocal==1.5.1
urllib3==1.25.9
vispy==0.6.4
wcwidth==0.1.9
Werkzeug==1.0.1
wheel==0.34.2
wrapt==1.12.1
xlrd==1.2.0
XlsxWriter==1.2.8
yarg==0.1.9
Next steps
No action items identified.
Issue description
I have a set of three hparams event files, 1, 2a and 2b, from basically the same code except for differences in some parameters. So my _all
folder looks like this:
./1:
events.out.tfevents.1588833243.redacted.29208.547.v2
./2a:
events.out.tfevents.1588835195.redacted.30899.547.v2
./2b:
events.out.tfevents.1588835761.redacted.31914.547.v2
When I start tensorboard --logdir _all/1
, I see file 1 in http://localhost:6006/#hparams
.
When I start tensorboard --logdir _all/2a
, I see file 2a in http://localhost:6006/#hparams
.
When I start tensorboard --logdir _all/2b
, I see file 2b in http://localhost:6006/#hparams
.
The actual issue
When I start tensorboard --logdir _all
, I see only file 1 in http://localhost:6006/#hparams
.
I also have a folder _1+2a
, in which I only see file 1 (issue); and a folder 2a+b
, in which I see both file 2a and 2b (expected).
How I can I (help) investigate what is going wrong here?
Here's --inspect
:
======================================================================
Processing event files... (this can take a few minutes)
======================================================================
Found event files in:
_all/1
_all/2a
_all/2b
These tags are in _all/1:
audio -
histograms -
images -
scalars -
tensor
_hparams_/session_end_info
_hparams_/session_start_info
======================================================================
Event statistics for _all/1:
audio -
graph -
histograms -
images -
scalars -
sessionlog:checkpoint -
sessionlog:start -
sessionlog:stop -
tensor
first_step 0
last_step 0
max_step 0
min_step 0
num_steps 1
outoforder_steps []
======================================================================
These tags are in _all/2a:
audio -
histograms -
images -
scalars -
tensor
_hparams_/session_end_info
_hparams_/session_start_info
======================================================================
Event statistics for _all/2a:
audio -
graph -
histograms -
images -
scalars -
sessionlog:checkpoint -
sessionlog:start -
sessionlog:stop -
tensor
first_step 0
last_step 0
max_step 0
min_step 0
num_steps 1
outoforder_steps []
======================================================================
These tags are in _all/2b:
audio -
histograms -
images -
scalars -
tensor
_hparams_/session_end_info
_hparams_/session_start_info
======================================================================
Event statistics for _all/2b:
audio -
graph -
histograms -
images -
scalars -
sessionlog:checkpoint -
sessionlog:start -
sessionlog:stop -
tensor
first_step 0
last_step 0
max_step 0
min_step 0
num_steps 1
outoforder_steps []
======================================================================
I have identified the issue, I believe. For the record, this is the code I used:
from tensorboard.plugins.hparams import metadata
from tensorflow.python.summary.summary_iterator import summary_iterator
for filename in FILENAMES:
for summary in summary_iterator(filename):
try:
content = summary.summary.value[0].metadata.plugin_data.content
except IndexError:
continue
info = metadata.parse_session_start_info_plugin_data(content)
print(info)
break
Apart from minor changes in number_value
s (2.0 vs. 10.0 vs. 1000.0), I noticed that file 1 has
params {
key: "stop_early_epochs"
value {
string_value: "None"
}
}
while the other set (2a/2b) has
hparams {
key: "stop_early_epochs"
value {
number_value: 20.0
}
}
It seems this seriously confuses TensorBoard, making it impossible to load files across the two sets together. Can this be fixed easily?
Further digging yields this comment:
https://github.com/tensorflow/tensorboard/blob/026585b8c19b7472a213f087b200e21bf8cf26d0/tensorboard/plugins/hparams/backend_context.py#L228-L243
Note the last lines,
If all values have the same type ... Otherwise, the returned type is DATA_TYPE_STRING.
This somewhat explains why I only see the file sets that have a string value as soon as one set with a string value "stop_early_epochs" is loaded. Interestingly, TensorBoard shows both values in the HyperParameters panel, so I'd say this is a bug:
A possible fix:
https://github.com/tensorflow/tensorboard/blob/026585b8c19b7472a213f087b200e21bf8cf26d0/tensorboard/plugins/hparams/list_session_groups.py#L538-L539
Replace these two lines by
def filter_fn(value):
if len(discrete_set):
set_type = type(discrete_set[0])
value = set_type(value)
return value in discrete_set
There may be smarter solutions (such as getting the type from the associated hparams_info, and safer casting obviously), but that does it for me for now.
Code to repro this:
# tensorboard --logdir logs
# http://localhost:6006/#hparams
import tensorflow as tf
from tensorboard.plugins.hparams import api as hp
def run(param):
with tf.summary.create_file_writer("logs/param_" + str(param)).as_default():
hp.hparams({"param": param})
tf.summary.scalar("metric", 0, step=0)
run(0)
run(1)
# Screenshot 1
run("str")
# Screenshot 2
Screenshot 1 shows runs 0 and 1, as expected:
Screenshot 2 shows run "str", but hides runs 0 and 1:
Error message:
E0330 21:06:31.824003 13996 hparams_plugin.py:154] HParams error: Cannot use an interval filter for a value of type: <class 'str'>, Value: str