tensorboard icon indicating copy to clipboard operation
tensorboard copied to clipboard

No step marker observed and hence the step time is unknown

Open pritamdodeja opened this issue 2 years ago • 9 comments

Consider Stack Overflow for getting support using TensorBoard—they have a larger community with better searchability:

https://stackoverflow.com/questions/tagged/tensorboard

Do not use this template for for setup, installation, or configuration issues. Instead, use the “installation problem” issue template:

https://github.com/tensorflow/tensorboard/issues/new?template=installation_problem.md

To report a problem with TensorBoard itself, please fill out the remainder of this template.

Environment information (required)

Please run diagnose_tensorboard.py (link below) in the same environment from which you normally run TensorFlow/TensorBoard, and paste the output here:

https://raw.githubusercontent.com/tensorflow/tensorboard/master/tensorboard/tools/diagnose_tensorboard.py

Diagnostics

Diagnostics output
--- check: autoidentify                                                         
INFO: diagnose_tensorboard.py version 516a2f9433ba4f9c3a4fdb0f89735870eda054a1  
                                                                                
--- check: general                                                              
INFO: sys.version_info: sys.version_info(major=3, minor=8, micro=10, releaselevel='final', serial=0)
INFO: os.name: posix                                                            
INFO: os.uname(): posix.uname_result(sysname='Linux', nodename='71d6fe811d18', release='6.0.5-200.fc36.x86_64', version='#1 SMP PREEMPT_DYNAMIC Wed Oct 26 15:55:21 UTC 2022', machine='x86_64')
INFO: sys.getwindowsversion(): N/A                                              
                                                                                
--- check: package_management                                                   
INFO: has conda-meta: False                                                     
INFO: $VIRTUAL_ENV: None                                                        
                                                                                
--- check: installed_packages                                                   
INFO: installed: tensorboard==2.11.0                                            
WARNING: no installation among: ['tensorflow', 'tensorflow-gpu', 'tf-nightly', 'tf-nightly-2.0-preview', 'tf-nightly-gpu', 'tf-nightly-gpu-2.0-preview']
INFO: installed: tensorflow-estimator==2.11.0                                   
INFO: installed: tensorboard-data-server==0.6.1                                 
                                                                                
--- check: tensorboard_python_version                                           
INFO: tensorboard.version.VERSION: '2.11.0'                                     
                                                                                
--- check: tensorflow_python_version                                            
INFO: tensorflow.__version__: '2.11.0'                                          
INFO: tensorflow.__git_version__: 'v2.11.0-rc2-17-gd5b57ca93e5'                 
                                                                                
--- check: tensorboard_data_server_version                                      
INFO: data server binary: '/usr/local/lib/python3.8/dist-packages/tensorboard_data_server/bin/server'
INFO: data server binary version: b'rustboard 0.6.1'                            
                                                                                
--- check: tensorboard_binary_path                                              
INFO: which tensorboard: b'/usr/local/bin/tensorboard\n'                        
                                                                                
--- check: addrinfos                                                            
socket.has_ipv6 = True                                                          
socket.AF_UNSPEC = <AddressFamily.AF_UNSPEC: 0>                                 
socket.SOCK_STREAM = <SocketKind.SOCK_STREAM: 1>                                
socket.AI_ADDRCONFIG = <AddressInfo.AI_ADDRCONFIG: 32>                          
socket.AI_PASSIVE = <AddressInfo.AI_PASSIVE: 1>                                 
Loopback flags: <AddressInfo.AI_ADDRCONFIG: 32>                                 
Loopback infos: [(<AddressFamily.AF_INET: 2>, <SocketKind.SOCK_STREAM: 1>, 6, '', ('127.0.0.1', 0))]
Wildcard flags: <AddressInfo.AI_PASSIVE: 1>                                     
Wildcard infos: [(<AddressFamily.AF_INET: 2>, <SocketKind.SOCK_STREAM: 1>, 6, '', ('0.0.0.0', 0)), (<AddressFamily.AF_INET6: 10>, <SocketKind.SOCK_STREAM: 1>, 6, '', ('::', 0, 0, 0))]
                                                                                
--- check: readable_fqdn                                                        
INFO: socket.getfqdn(): '71d6fe811d18'                                          
                                                                                
--- check: stat_tensorboardinfo                                                 
INFO: directory: /tmp/.tensorboard-info                                         
INFO: os.stat(...): os.stat_result(st_mode=16895, st_ino=805882112, st_dev=51, st_nlink=2, st_uid=0, st_gid=0, st_size=6, st_atime=1677293427, st_mtime=1677293598, st_ctime=1677293598)
INFO: mode: 0o40777                                                             
                                                                                
--- check: source_trees_without_genfiles                                        
INFO: tensorboard_roots (1): ['/usr/local/lib/python3.8/dist-packages']; bad_roots (0): []
                                                                                
--- check: full_pip_freeze                                                      
INFO: pip freeze --all:                                                         
absl-py==1.3.0                                                                  
anyio==3.6.2                                                                    
argon2-cffi==21.3.0                                                             
argon2-cffi-bindings==21.2.0                                                    
asttokens==2.1.0                                                                
astunparse==1.6.3                                                               
attrs==22.1.0                                                                   
backcall==0.2.0                                                                 
beautifulsoup4==4.11.1                                                          
bleach==5.0.1                                                                   
cachetools==5.2.0                                                               
certifi==2022.9.24                                                              
cffi==1.15.1                                                                    
charset-normalizer==2.1.1                                                       
contourpy==1.0.6                                                                
cycler==0.11.0                                                                  
debugpy==1.6.3                                                                  
decorator==5.1.1                                                                
defusedxml==0.7.1                                                               
entrypoints==0.4                                                                
executing==1.2.0                                                                
fastjsonschema==2.16.2                                                          
flatbuffers==22.10.26                                                           
fonttools==4.38.0                                                               
gast==0.4.0                                                                     
google-auth==2.14.1                                                             
google-auth-oauthlib==0.4.6                                                     
google-pasta==0.2.0                                                             
grpcio==1.50.0                                                                  
gviz-api==1.10.0                                                                
h5py==3.7.0                                                                     
idna==3.4                                                                       
importlib-metadata==5.0.0                                                       
importlib-resources==5.10.0                                                     
ipykernel==5.1.1                                                                
ipython==8.6.0                                                                  
ipython-genutils==0.2.0                                                         
ipywidgets==8.0.2                                                               
jedi==0.17.2                                                                    
Jinja2==3.1.2                                                                   
jsonschema==4.17.0                                                              
jupyter==1.0.0                                                                  
jupyter-client==7.4.7                                                           
jupyter-console==6.4.4                                                          
jupyter-core==5.0.0                                                             
jupyter-http-over-ws==0.0.8                                                     
jupyter-server==1.23.2                                                          
jupyterlab-pygments==0.2.2                                                      
jupyterlab-widgets==3.0.3                                                       
keras==2.11.0                                                                   
kiwisolver==1.4.4                                                               
libclang==14.0.6                                                                
Markdown==3.4.1                                                                 
MarkupSafe==2.1.1                                                               
matplotlib==3.6.2                                                               
matplotlib-inline==0.1.6                                                        
mistune==2.0.4                                                                  
nbclassic==0.4.8                                                                
nbclient==0.7.0                                                                 
nbconvert==7.2.5                                                                
nbformat==4.4.0                                                                 
nest-asyncio==1.5.6                                                             
notebook==6.5.2                                                                 
notebook-shim==0.2.2                                                            
numpy==1.23.4                                                                   
oauthlib==3.2.2                                                                 
opt-einsum==3.3.0                                                               
packaging==21.3                                                                 
pandocfilters==1.5.0                                                            
parso==0.7.1                                                                    
pexpect==4.8.0                                                                  
pickleshare==0.7.5                                                              
Pillow==9.3.0                                                                   
pip==20.2.4                                                                     
pkgutil-resolve-name==1.3.10                                                    
platformdirs==2.5.4                                                             
prometheus-client==0.15.0                                                       
prompt-toolkit==3.0.32                                                          
protobuf==3.19.6                                                                
psutil==5.9.4                                                                   
ptyprocess==0.7.0                                                               
pure-eval==0.2.2                                                                
pyasn1==0.4.8                                                                   
pyasn1-modules==0.2.8                                                           
pycparser==2.21                                                                 
Pygments==2.13.0                                                                
pyparsing==3.0.9                                                                
pyrsistent==0.19.2                                                              
python-dateutil==2.8.2                                                          
pyzmq==24.0.1                                                                   
qtconsole==5.4.0                                                                
QtPy==2.3.0                                                                     
requests==2.28.1                                                                
requests-oauthlib==1.3.1                                                        
rsa==4.9                                                                        
Send2Trash==1.8.0                                                               
setuptools==65.5.1                                                              
six==1.16.0                                                                     
sniffio==1.3.0                                                                  
soupsieve==2.3.2.post1                                                          
stack-data==0.6.1                                                               
tensorboard==2.11.0                                                             
tensorboard-data-server==0.6.1                                                  
tensorboard-plugin-profile==2.11.1                                              
tensorboard-plugin-wit==1.8.1                                                   
tensorflow-cpu==2.11.0                                                          
tensorflow-estimator==2.11.0                                                    
tensorflow-io-gcs-filesystem==0.27.0                                            
termcolor==2.1.0                                                                
terminado==0.17.0                                                               
tinycss2==1.2.1                                                                 
tornado==6.2                                                                    
traitlets==5.5.0                                                                
typing-extensions==4.4.0                                                        
urllib3==1.26.12                                                                
wcwidth==0.2.5                                                                  
webencodings==0.5.1                                                             
websocket-client==1.4.2                                                         
Werkzeug==2.2.2                                                                 
wheel==0.34.2                                                                   
widgetsnbextension==4.0.3                                                       
wrapt==1.14.1                                                                   
zipp==3.10.0                                                                    
                                                                                

Next steps

No action items identified. Please copy ALL of the above output,
including the lines containing only backticks, into your GitHub issue
or comment. Be sure to redact any sensitive information.
~
For browser-related issues, please additionally specify:

  • Browser type and version (e.g., Chrome 64.0.3282.140):
  • Screenshot, if it’s a visual issue:
image

Issue description

Running very standard example of tensorboard callback, code below, and getting No step marker observed issue

import tensorflow as tf
import datetime
mnist = tf.keras.datasets.mnist

(x_train, y_train),(x_test, y_test) = mnist.load_data()
x_train, x_test = x_train / 255.0, x_test / 255.0

def create_model():
  return tf.keras.models.Sequential([
    tf.keras.layers.Flatten(input_shape=(28, 28), name='layers_flatten'),
    tf.keras.layers.Dense(512, activation='relu', name='layers_dense'),
    tf.keras.layers.Dropout(0.2, name='layers_dropout'),
    tf.keras.layers.Dense(10, activation='softmax', name='layers_dense_2')
  ])

model = create_model()
model.compile(optimizer='adam',
              loss='sparse_categorical_crossentropy',
              metrics=['accuracy'])

log_dir = "logs/fit/" + datetime.datetime.now().strftime("%Y%m%d-%H%M%S")
tensorboard_callback = tf.keras.callbacks.TensorBoard(log_dir=log_dir, histogram_freq=1, profile_batch=(1,50))

model.fit(x=x_train, 
          y=y_train, 
          epochs=5, 
          validation_data=(x_test, y_test), 
          callbacks=[tensorboard_callback])



Please describe the bug as clearly as possible. How can we reproduce the problem without additional resources (including external data files and proprietary Python modules)?

Step markers are either not getting logged by Keras or are not being read by tensorboard. I would expect that this information is logged so that I can use the module for optimizing tf.data usage. The environment that this is run in is a standard tensorflow docker container with the only additional package installed being tensorboard_plugin_profile

pritamdodeja avatar Feb 25 '23 03:02 pritamdodeja

I also experienced a similar problem, with TF 2.11. I used a GPU, CUPTI was working, and many "Tools" in the profiler showed data:

  • trace_viewer shows both CPU and GPU operations
  • memory_profile shows GPU memory utilization
  • ...

but some tools did not

  • overview_page, showing the same warning as above,
  • input_pipeline_analyzer showed a warning about no step markers,
  • pod_viewer, showing the same warning about no step marker being observed.

However, when I upgraded the protobuf package to 3.20.3, the profile plugin started showing all the data. However, TF 2.11 is not compatible with this protobuf (it requires protobuf < 3.20), so I ended up creating a virtual environment for TensorBoard 2.11 itself, with protobuf 3.20.3.

Also note that with TF 2.12.0rc0, the profile plugin does not even load, because TF 2.12.0rc0 by default uses protobuf with major version 4, but the profile plugin is not compatible with it (it seems to require protobuf < 4) -- but I assume the 2.12 version of the profile plugin will be made compatible (or explicitly require protobuf < 4).

foxik avatar Feb 25 '23 11:02 foxik

Thank you very much for your help @foxik! I'll try and get this to work the way you have suggested and spend some time understanding protocol buffers better to be able to understand the root cause. I will provide updates here.

pritamdodeja avatar Feb 25 '23 13:02 pritamdodeja

Hi @pritamdodeja ,

The profiler is at its own repo: https://github.com/tensorflow/profiler/issues Could you reraise this issue there and perhaps mention the information from @foxik in case this really is a protobuf compatibility issue?

bmd3k avatar Feb 28 '23 17:02 bmd3k

@bmd3k traveling for work, will do this in the next couple of days. Thanks!

pritamdodeja avatar Mar 02 '23 04:03 pritamdodeja

@bmd3k I have opened the issue at https://github.com/tensorflow/profiler/issues/578 - let me know if I can go ahead and close it out here. Thanks!

pritamdodeja avatar Mar 18 '23 09:03 pritamdodeja

Still having the same issue in tensorflow 2.12.0 with protobuf=3.20.0. Is there a fix to the issue ? Thank you !

I have the same issue with TF2.11

pietroorlandi avatar Apr 22 '23 12:04 pietroorlandi

Any update here please

The bug is caused by this commit where the GroupTfEvents was moved from the producer side to the PreprocessSingleHostXPlane() (the consumer side). When there is only one XSpace, it won’t do the Grouping and preprocessing (code), hence causing the generated op stats doesn’t have any group_id defined, and no step marker will be generated (code).

The Bug was fixed in tf 2.12 by this commit.

I have validated the tf 2.12 works to read previous generated tf profiling results. validate_fix

supercharleszhu avatar Aug 28 '23 20:08 supercharleszhu