tensorboard icon indicating copy to clipboard operation
tensorboard copied to clipboard

Batch accuracy and batch loss are not being plotted in browser or vscode plugin

Open JohnAtl opened this issue 1 year ago • 2 comments

This link in the bug report text did not work for me:

https://raw.githubusercontent.com/tensorflow/tensorboard/master/tensorboard/tools/diagnose_tensorboard.py

/home/user/work/diagnose_tensorboard.py:32: DeprecationWarning: 'pipes' is deprecated and slated for removal in Python 3.13 import pipes

Diagnostics

Diagnostics output
--- check: autoidentify
INFO: diagnose_tensorboard.py version df7af2c6fc0e4c4a5b47aeae078bc7ad95777ffa

--- check: general
INFO: sys.version_info: sys.version_info(major=3, minor=12, micro=2, releaselevel='final', serial=0)
INFO: os.name: posix
INFO: os.uname(): posix.uname_result(sysname='Linux', nodename='beast', release='6.6.21-1-lts', version='#1 SMP PREEMPT_DYNAMIC Wed, 06 Mar 2024 16:59:55 +0000', machine='x86_64')
INFO: sys.getwindowsversion(): N/A

--- check: package_management
INFO: has conda-meta: False
INFO: $VIRTUAL_ENV: None

--- check: installed_packages
INFO: installed: tensorboard==2.16.2
INFO: installed: tensorflow==2.16.1
WARNING: no installation among: ['tensorflow-estimator', 'tensorflow-estimator-2.0-preview', 'tf-estimator-nightly']
INFO: installed: tensorboard-data-server==0.7.2

--- check: tensorboard_python_version
INFO: tensorboard.version.VERSION: '2.16.2'

--- check: tensorflow_python_version
2024-03-14 13:58:36.829200: I tensorflow/core/util/port.cc:113] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
2024-03-14 13:58:36.851097: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 AVX_VNNI FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
2024-03-14 13:58:37.239354: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
INFO: tensorflow.__version__: '2.16.1'
INFO: tensorflow.__git_version__: 'v2.16.1-0-g5bc9d26649c'

--- check: tensorboard_data_server_version
INFO: data server binary: '/home/john/work/Sleep/.venv/lib/python3.12/site-packages/tensorboard_data_server/bin/server'
INFO: data server binary version: b'rustboard 0.7.2'

--- check: tensorboard_binary_path
INFO: which tensorboard: b'/home/john/work/Sleep/.venv/bin/tensorboard\n'

--- check: addrinfos
socket.has_ipv6 = True
socket.AF_UNSPEC = <AddressFamily.AF_UNSPEC: 0>
socket.SOCK_STREAM = <SocketKind.SOCK_STREAM: 1>
socket.AI_ADDRCONFIG = <AddressInfo.AI_ADDRCONFIG: 32>
socket.AI_PASSIVE = <AddressInfo.AI_PASSIVE: 1>
Loopback flags: <AddressInfo.AI_ADDRCONFIG: 32>
Loopback infos: [(<AddressFamily.AF_INET6: 10>, <SocketKind.SOCK_STREAM: 1>, 6, '', ('::1', 0, 0, 0)), (<AddressFamily.AF_INET: 2>, <SocketKind.SOCK_STREAM: 1>, 6, '', ('127.0.0.1', 0))]
Wildcard flags: <AddressInfo.AI_PASSIVE: 1>
Wildcard infos: [(<AddressFamily.AF_INET: 2>, <SocketKind.SOCK_STREAM: 1>, 6, '', ('0.0.0.0', 0)), (<AddressFamily.AF_INET6: 10>, <SocketKind.SOCK_STREAM: 1>, 6, '', ('::', 0, 0, 0))]

--- check: readable_fqdn
INFO: socket.getfqdn(): 'beast'

--- check: stat_tensorboardinfo
INFO: directory: /tmp/.tensorboard-info
INFO: os.stat(...): os.stat_result(st_mode=16895, st_ino=1853, st_dev=44, st_nlink=2, st_uid=1000, st_gid=1000, st_size=40, st_atime=1710438727, st_mtime=1710438986, st_ctime=1710438986)
INFO: mode: 0o40777

--- check: source_trees_without_genfiles
INFO: tensorboard_roots (1): ['/home/john/work/Sleep/.venv/lib/python3.12/site-packages']; bad_roots (0): []

--- check: full_pip_freeze
INFO: pip freeze --all:
absl-py==2.1.0
asttokens==2.4.1
astunparse==1.6.3
bidict==0.23.1
biosppy==2.1.2
certifi==2024.2.2
charset-normalizer==3.3.2
colorama==0.4.6
colorlog==6.8.2
comm==0.2.2
contourpy==1.2.0
cycler==0.12.1
debugpy==1.8.1
decorator==5.1.1
dm-tree==0.1.8
easydev==0.13.1
edfio==0.4.0
executing==2.0.1
flatbuffers==24.3.7
fonttools==4.49.0
future==1.0.0
gast==0.5.4
google-pasta==0.2.0
grpcio==1.62.1
h5py==3.10.0
idna==3.6
ipykernel==6.29.3
ipython==8.22.2
jedi==0.19.1
Jinja2==3.1.3
joblib==1.3.2
jupyter_client==8.6.1
jupyter_core==5.7.2
keras==3.0.5
kiwisolver==1.4.5
lazy_loader==0.3
libclang==16.0.6
lightgbm==4.3.0
line-profiler==4.1.2
lxml==5.1.0
Markdown==3.6
markdown-it-py==3.0.0
MarkupSafe==2.1.5
matplotlib==3.8.3
matplotlib-inline==0.1.6
mdurl==0.1.2
ml-dtypes==0.3.2
mne==1.6.1
namex==0.0.7
nest-asyncio==1.6.0
nolds==0.5.2
numpy==1.26.4
nvidia-cublas-cu12==12.3.4.1
nvidia-cuda-cupti-cu12==12.3.101
nvidia-cuda-nvcc-cu12==12.3.107
nvidia-cuda-nvrtc-cu12==12.3.107
nvidia-cuda-runtime-cu12==12.3.101
nvidia-cudnn-cu12==8.9.7.29
nvidia-cufft-cu12==11.0.12.1
nvidia-curand-cu12==10.3.4.107
nvidia-cusolver-cu12==11.5.4.101
nvidia-cusparse-cu12==12.2.0.103
nvidia-nccl-cu12==2.19.3
nvidia-nvjitlink-cu12==12.3.101
opencv-python==4.9.0.80
opt-einsum==3.3.0
packaging==24.0
pandas==2.2.1
parso==0.8.3
pexpect==4.9.0
pillow==10.2.0
pip==24.0
platformdirs==4.2.0
pooch==1.8.1
prompt-toolkit==3.0.43
protobuf==4.25.3
psutil==5.9.8
ptyprocess==0.7.0
pure-eval==0.2.2
Pygments==2.17.2
pyhrv==0.4.1
pyparsing==3.1.2
python-dateutil==2.9.0.post0
pytz==2024.1
PyWavelets==1.5.0
pyzmq==25.1.2
requests==2.31.0
rich==13.7.1
scikit-learn==1.4.1.post1
scipy==1.12.0
seaborn==0.13.2
setuptools==69.2.0
shortuuid==1.0.13
six==1.16.0
spectrum==0.8.1
stack-data==0.6.3
tensorboard==2.16.2
tensorboard-data-server==0.7.2
tensorflow==2.16.1
termcolor==2.4.0
threadpoolctl==3.3.0
tornado==6.4
tqdm==4.66.2
traitlets==5.14.2
typing_extensions==4.10.0
tzdata==2024.1
urllib3==2.2.1
wcwidth==0.2.13
Werkzeug==3.0.1
wheel==0.43.0
wrapt==1.16.0

In vscode plugin and Firefox, the same issue: image

Issue description

The batch_accuracy and batch_loss are not being plotted. Their is a single dot at the center, but this screenshot was taken after some 3100 batches, so there should have been a line plotted for both.

Callbacks in my model.fit:

        callbacks=[
            tf.keras.callbacks.TensorBoard(log_dir=LOG_PATH, update_freq="batch"),
            chkpt_callback,
        ],

JohnAtl avatar Mar 14 '24 18:03 JohnAtl

Would you mind running tensorboard --inspect --logdir <your log directory> and providing the results?

groszewn avatar Mar 14 '24 19:03 groszewn

Sure!

inspect output
======================================================================
Processing event files... (this can take a few minutes)
======================================================================

Found event files in:
logs/train
logs/validation

These tags are in logs/train:
audio -
histograms -
images -
scalars -
tensor
   batch_accuracy
   batch_loss
   epoch_accuracy
   epoch_learning_rate
   epoch_loss
   keras
======================================================================

Event statistics for logs/train:
audio -
graph
   first_step           0
   last_step            0
   max_step             0
   min_step             0
   num_steps            1
   outoforder_steps     []
histograms -
images -
scalars -
sessionlog:checkpoint -
sessionlog:start -
sessionlog:stop -
tensor
   first_step           0
   last_step            0
   max_step             9
   min_step             0
   num_steps            10
   outoforder_steps     [(1, 0), (1, 0), (2, 0), (3, 0), (4, 0), (5, 0), (6, 0), (7, 0), (8, 0), (9, 0), (1, 0), (2, 0), (3, 0), (4, 0), (5, 0), (6, 0), (7, 0), (8, 0), (9, 0), (1, 0), (2, 0), (3, 0), (4, 0), (5, 0), (6, 0), (7, 0), (8, 0), (9, 0), (1, 0), (2, 0), (3, 0), (4, 0), (5, 0), (6, 0), (7, 0), (8, 0), (9, 0), (1, 0), (2, 0), (3, 0), (4, 0), (5, 0), (6, 0), (7, 0), (8, 0)]
======================================================================

These tags are in logs/validation:
audio -
histograms -
images -
scalars -
tensor
   epoch_accuracy
   epoch_loss
   evaluation_accuracy_vs_iterations
   evaluation_loss_vs_iterations
======================================================================

Event statistics for logs/validation:
audio -
graph -
histograms -
images -
scalars -
sessionlog:checkpoint -
sessionlog:start -
sessionlog:stop -
tensor
   first_step           4744
   last_step            8
   max_step             47480
   min_step             0
   num_steps            49
   outoforder_steps     [(4744, 0), (9488, 1), (4691, 0), (9382, 1), (14073, 2), (18764, 3), (23455, 4), (28146, 5), (32837, 6), (37528, 7), (42219, 8), (46910, 9), (4691, 0), (9382, 1), (14073, 2), (18764, 3), (23455, 4), (28146, 5), (32837, 6), (37528, 7), (42219, 8), (46910, 9), (4744, 0), (9488, 1), (14232, 2), (18976, 3), (23720, 4), (28464, 5), (33208, 6), (37952, 7), (42696, 8), (47440, 9), (4748, 0), (9496, 1), (14244, 2), (18992, 3), (23740, 4), (28488, 5), (33236, 6), (37984, 7), (42732, 8), (47480, 9), (4719, 0), (9438, 1), (14157, 2), (18876, 3), (23595, 4), (28314, 5), (33033, 6), (37752, 7), (42471, 8)]
======================================================================

Also, the display in Scalars is the same. And, the single dot is being updated with the latest value when the 30-second update triggers.

image

JohnAtl avatar Mar 15 '24 12:03 JohnAtl