tensorboard
tensorboard copied to clipboard
No step marker observed and hence the step time is unknown
Consider Stack Overflow for getting support using TensorBoard—they have a larger community with better searchability:
https://stackoverflow.com/questions/tagged/tensorboard
Do not use this template for for setup, installation, or configuration issues. Instead, use the “installation problem” issue template:
https://github.com/tensorflow/tensorboard/issues/new?template=installation_problem.md
To report a problem with TensorBoard itself, please fill out the remainder of this template.
Environment information (required)
Please run diagnose_tensorboard.py (link below) in the same
environment from which you normally run TensorFlow/TensorBoard, and
paste the output here:
https://raw.githubusercontent.com/tensorflow/tensorboard/master/tensorboard/tools/diagnose_tensorboard.py
Diagnostics
Diagnostics output
--- check: autoidentify
INFO: diagnose_tensorboard.py version 516a2f9433ba4f9c3a4fdb0f89735870eda054a1
--- check: general
INFO: sys.version_info: sys.version_info(major=3, minor=8, micro=10, releaselevel='final', serial=0)
INFO: os.name: posix
INFO: os.uname(): posix.uname_result(sysname='Linux', nodename='71d6fe811d18', release='6.0.5-200.fc36.x86_64', version='#1 SMP PREEMPT_DYNAMIC Wed Oct 26 15:55:21 UTC 2022', machine='x86_64')
INFO: sys.getwindowsversion(): N/A
--- check: package_management
INFO: has conda-meta: False
INFO: $VIRTUAL_ENV: None
--- check: installed_packages
INFO: installed: tensorboard==2.11.0
WARNING: no installation among: ['tensorflow', 'tensorflow-gpu', 'tf-nightly', 'tf-nightly-2.0-preview', 'tf-nightly-gpu', 'tf-nightly-gpu-2.0-preview']
INFO: installed: tensorflow-estimator==2.11.0
INFO: installed: tensorboard-data-server==0.6.1
--- check: tensorboard_python_version
INFO: tensorboard.version.VERSION: '2.11.0'
--- check: tensorflow_python_version
INFO: tensorflow.__version__: '2.11.0'
INFO: tensorflow.__git_version__: 'v2.11.0-rc2-17-gd5b57ca93e5'
--- check: tensorboard_data_server_version
INFO: data server binary: '/usr/local/lib/python3.8/dist-packages/tensorboard_data_server/bin/server'
INFO: data server binary version: b'rustboard 0.6.1'
--- check: tensorboard_binary_path
INFO: which tensorboard: b'/usr/local/bin/tensorboard\n'
--- check: addrinfos
socket.has_ipv6 = True
socket.AF_UNSPEC = <AddressFamily.AF_UNSPEC: 0>
socket.SOCK_STREAM = <SocketKind.SOCK_STREAM: 1>
socket.AI_ADDRCONFIG = <AddressInfo.AI_ADDRCONFIG: 32>
socket.AI_PASSIVE = <AddressInfo.AI_PASSIVE: 1>
Loopback flags: <AddressInfo.AI_ADDRCONFIG: 32>
Loopback infos: [(<AddressFamily.AF_INET: 2>, <SocketKind.SOCK_STREAM: 1>, 6, '', ('127.0.0.1', 0))]
Wildcard flags: <AddressInfo.AI_PASSIVE: 1>
Wildcard infos: [(<AddressFamily.AF_INET: 2>, <SocketKind.SOCK_STREAM: 1>, 6, '', ('0.0.0.0', 0)), (<AddressFamily.AF_INET6: 10>, <SocketKind.SOCK_STREAM: 1>, 6, '', ('::', 0, 0, 0))]
--- check: readable_fqdn
INFO: socket.getfqdn(): '71d6fe811d18'
--- check: stat_tensorboardinfo
INFO: directory: /tmp/.tensorboard-info
INFO: os.stat(...): os.stat_result(st_mode=16895, st_ino=805882112, st_dev=51, st_nlink=2, st_uid=0, st_gid=0, st_size=6, st_atime=1677293427, st_mtime=1677293598, st_ctime=1677293598)
INFO: mode: 0o40777
--- check: source_trees_without_genfiles
INFO: tensorboard_roots (1): ['/usr/local/lib/python3.8/dist-packages']; bad_roots (0): []
--- check: full_pip_freeze
INFO: pip freeze --all:
absl-py==1.3.0
anyio==3.6.2
argon2-cffi==21.3.0
argon2-cffi-bindings==21.2.0
asttokens==2.1.0
astunparse==1.6.3
attrs==22.1.0
backcall==0.2.0
beautifulsoup4==4.11.1
bleach==5.0.1
cachetools==5.2.0
certifi==2022.9.24
cffi==1.15.1
charset-normalizer==2.1.1
contourpy==1.0.6
cycler==0.11.0
debugpy==1.6.3
decorator==5.1.1
defusedxml==0.7.1
entrypoints==0.4
executing==1.2.0
fastjsonschema==2.16.2
flatbuffers==22.10.26
fonttools==4.38.0
gast==0.4.0
google-auth==2.14.1
google-auth-oauthlib==0.4.6
google-pasta==0.2.0
grpcio==1.50.0
gviz-api==1.10.0
h5py==3.7.0
idna==3.4
importlib-metadata==5.0.0
importlib-resources==5.10.0
ipykernel==5.1.1
ipython==8.6.0
ipython-genutils==0.2.0
ipywidgets==8.0.2
jedi==0.17.2
Jinja2==3.1.2
jsonschema==4.17.0
jupyter==1.0.0
jupyter-client==7.4.7
jupyter-console==6.4.4
jupyter-core==5.0.0
jupyter-http-over-ws==0.0.8
jupyter-server==1.23.2
jupyterlab-pygments==0.2.2
jupyterlab-widgets==3.0.3
keras==2.11.0
kiwisolver==1.4.4
libclang==14.0.6
Markdown==3.4.1
MarkupSafe==2.1.1
matplotlib==3.6.2
matplotlib-inline==0.1.6
mistune==2.0.4
nbclassic==0.4.8
nbclient==0.7.0
nbconvert==7.2.5
nbformat==4.4.0
nest-asyncio==1.5.6
notebook==6.5.2
notebook-shim==0.2.2
numpy==1.23.4
oauthlib==3.2.2
opt-einsum==3.3.0
packaging==21.3
pandocfilters==1.5.0
parso==0.7.1
pexpect==4.8.0
pickleshare==0.7.5
Pillow==9.3.0
pip==20.2.4
pkgutil-resolve-name==1.3.10
platformdirs==2.5.4
prometheus-client==0.15.0
prompt-toolkit==3.0.32
protobuf==3.19.6
psutil==5.9.4
ptyprocess==0.7.0
pure-eval==0.2.2
pyasn1==0.4.8
pyasn1-modules==0.2.8
pycparser==2.21
Pygments==2.13.0
pyparsing==3.0.9
pyrsistent==0.19.2
python-dateutil==2.8.2
pyzmq==24.0.1
qtconsole==5.4.0
QtPy==2.3.0
requests==2.28.1
requests-oauthlib==1.3.1
rsa==4.9
Send2Trash==1.8.0
setuptools==65.5.1
six==1.16.0
sniffio==1.3.0
soupsieve==2.3.2.post1
stack-data==0.6.1
tensorboard==2.11.0
tensorboard-data-server==0.6.1
tensorboard-plugin-profile==2.11.1
tensorboard-plugin-wit==1.8.1
tensorflow-cpu==2.11.0
tensorflow-estimator==2.11.0
tensorflow-io-gcs-filesystem==0.27.0
termcolor==2.1.0
terminado==0.17.0
tinycss2==1.2.1
tornado==6.2
traitlets==5.5.0
typing-extensions==4.4.0
urllib3==1.26.12
wcwidth==0.2.5
webencodings==0.5.1
websocket-client==1.4.2
Werkzeug==2.2.2
wheel==0.34.2
widgetsnbextension==4.0.3
wrapt==1.14.1
zipp==3.10.0
Next steps
No action items identified. Please copy ALL of the above output,
including the lines containing only backticks, into your GitHub issue
or comment. Be sure to redact any sensitive information.
~
For browser-related issues, please additionally specify:
- Browser type and version (e.g., Chrome 64.0.3282.140):
- Screenshot, if it’s a visual issue:
Issue description
Running very standard example of tensorboard callback, code below, and getting No step marker observed issue
import tensorflow as tf
import datetime
mnist = tf.keras.datasets.mnist
(x_train, y_train),(x_test, y_test) = mnist.load_data()
x_train, x_test = x_train / 255.0, x_test / 255.0
def create_model():
return tf.keras.models.Sequential([
tf.keras.layers.Flatten(input_shape=(28, 28), name='layers_flatten'),
tf.keras.layers.Dense(512, activation='relu', name='layers_dense'),
tf.keras.layers.Dropout(0.2, name='layers_dropout'),
tf.keras.layers.Dense(10, activation='softmax', name='layers_dense_2')
])
model = create_model()
model.compile(optimizer='adam',
loss='sparse_categorical_crossentropy',
metrics=['accuracy'])
log_dir = "logs/fit/" + datetime.datetime.now().strftime("%Y%m%d-%H%M%S")
tensorboard_callback = tf.keras.callbacks.TensorBoard(log_dir=log_dir, histogram_freq=1, profile_batch=(1,50))
model.fit(x=x_train,
y=y_train,
epochs=5,
validation_data=(x_test, y_test),
callbacks=[tensorboard_callback])
Please describe the bug as clearly as possible. How can we reproduce the problem without additional resources (including external data files and proprietary Python modules)?
Step markers are either not getting logged by Keras or are not being read by tensorboard. I would expect that this information is logged so that I can use the module for optimizing tf.data usage. The environment that this is run in is a standard tensorflow docker container with the only additional package installed being tensorboard_plugin_profile
I also experienced a similar problem, with TF 2.11. I used a GPU, CUPTI was working, and many "Tools" in the profiler showed data:
trace_viewershows both CPU and GPU operationsmemory_profileshows GPU memory utilization- ...
but some tools did not
overview_page, showing the same warning as above,input_pipeline_analyzershowed a warning about no step markers,pod_viewer, showing the same warning about no step marker being observed.
However, when I upgraded the protobuf package to 3.20.3, the profile plugin started showing all the data. However, TF 2.11 is not compatible with this protobuf (it requires protobuf < 3.20), so I ended up creating a virtual environment for TensorBoard 2.11 itself, with protobuf 3.20.3.
Also note that with TF 2.12.0rc0, the profile plugin does not even load, because TF 2.12.0rc0 by default uses protobuf with major version 4, but the profile plugin is not compatible with it (it seems to require protobuf < 4) -- but I assume the 2.12 version of the profile plugin will be made compatible (or explicitly require protobuf < 4).
Thank you very much for your help @foxik! I'll try and get this to work the way you have suggested and spend some time understanding protocol buffers better to be able to understand the root cause. I will provide updates here.
Hi @pritamdodeja ,
The profiler is at its own repo: https://github.com/tensorflow/profiler/issues Could you reraise this issue there and perhaps mention the information from @foxik in case this really is a protobuf compatibility issue?
@bmd3k traveling for work, will do this in the next couple of days. Thanks!
@bmd3k I have opened the issue at https://github.com/tensorflow/profiler/issues/578 - let me know if I can go ahead and close it out here. Thanks!
Still having the same issue in tensorflow 2.12.0 with protobuf=3.20.0. Is there a fix to the issue ? Thank you !
I have the same issue with TF2.11
Any update here please
The bug is caused by this commit where the GroupTfEvents was moved from the producer side to the PreprocessSingleHostXPlane() (the consumer side). When there is only one XSpace, it won’t do the Grouping and preprocessing (code), hence causing the generated op stats doesn’t have any group_id defined, and no step marker will be generated (code).
The Bug was fixed in tf 2.12 by this commit.
I have validated the tf 2.12 works to read previous generated tf profiling results.