tensorboard
tensorboard copied to clipboard
Tensorboard 2.9.1 --logdir as aws s3 path
I am using Tensorboard 2.9.1, when setting --logdir
as s3://<bucket>/<folder>
tensorboard is not able to read event files.
On my machine (EC2 instance) i am able to reach that logdir via aws cli (aws s3 ls s3://<bucket>/<folder>
).
In python I can also reach the files in that folder using tensorflow_io:
import tensorflow as tf
import tensorflow_io as tfio
data = tf.io.read_file("s3://<bucket>/<folder>/<file>")
This is the Tensorboard command:
AWS_REGION=us-east-1 S3_REGION=us-east-1 S3_ENDPOINT=s3.us-east-1.amazonaws.com S3_USE_HTTPS=1 S3_VERIFY_SSL=0 AWS_LOG_LEVEL=1 CUDA_VISIBLE_DEVICES="" tensorboard --logdir s3://<bucket>/<folder> --host 0.0.0.0
This is the error code:
2023-04-28 14:37:22.039640: W tensorflow/c/logging.cc:37] Retry Strategy will use the default max attempts.
2023-04-28 14:37:22.039685: W tensorflow/c/logging.cc:37] Retry Strategy will use the default max attempts.
2023-04-28 14:37:22.039714: W tensorflow/c/logging.cc:37] Token file must be specified to use STS AssumeRole web identity creds provider.
2023-04-28 14:37:22.039730: W tensorflow/c/logging.cc:37] Retry Strategy will use the default max attempts.
2023-04-28 14:37:22.068841: E tensorflow/c/logging.cc:40] HTTP response code: 404
Resolved remote host IP address:
Request ID:
Exception name:
Error message: No response body.
5 response headers:
content-type : application/xml
date : Fri, 28 Apr 2023 14:37:21 GMT
server : AmazonS3
x-amz-id-2 : Am5XM8hPcYQIbatGgTDYxOo0yxcPBkGFh5tg5tdM1bor4zc9Yzb1jkBZ0cd0rjaJ1XXJXoHk/tY=
x-amz-request-id : RNH62MB09RNQRT3H
2023-04-28 14:37:22.068889: W tensorflow/c/logging.cc:37] If the signature check failed. This could be because of a time skew. Attempting to adjust the signer.
2023-04-28 14:37:22.097549: E tensorflow/c/logging.cc:40] HTTP response code: 404
I would expect Tensorboard to use Tensorflow_IO's tensorflow_io/core/filesystems/s3/ but from the message above that does not seem to be happening.
Notice in the diagnostics report I am using tensorflow-io==0.26.0
and tensorflow-io-gcs-filesystem==0.26.0
Additionally I tried running tensorboard from a python script but get the same problem:
import os
import tensorflow as tf
import tensorflow_io as tfio
from tensorboard import program
os.environ["AWS_REGION"]="us-east-1"
os.environ["S3_REGION"]="us-east-1"
os.environ["S3_ENDPOINT"]="s3.us-east-1.amazonaws.com"
os.environ["S3_USE_HTTPS"]="1"
os.environ["S3_VERIFY_SSL"]="0"
os.environ["AWS_LOG_LEVEL"]="1"
tracking_address = 's3://<bucket>/<folder>' # the path of your log file.
host_ip = "0.0.0.0"
if __name__ == "__main__":
tb = program.TensorBoard()
tb.configure(argv=[None, '--logdir', tracking_address, '--bind_all'])
url = tb.launch()
print(f"Tensorflow listening on {url}")
Environment information (required)
Diagnostics
Diagnostics output
--- check: autoidentify
INFO: diagnose_tensorboard.py version df7af2c6fc0e4c4a5b47aeae078bc7ad95777ffa
--- check: general
INFO: sys.version_info: sys.version_info(major=3, minor=9, micro=5, releaselevel='final', serial=0)
INFO: os.name: posix
INFO: os.uname(): posix.uname_result(sysname='Linux', nodename='ip-xxx-xx-xx-xxx', release='5.15.0-1026-aws', version='#30~20.04.2-Ubuntu SMP Fri Nov 25 14:53:22 UTC 2022', machine='x86_64')
INFO: sys.getwindowsversion(): N/A
--- check: package_management
INFO: has conda-meta: False
INFO: $VIRTUAL_ENV: None
--- check: installed_packages
INFO: installed: tensorboard==2.9.1
INFO: installed: tensorflow==2.9.2
INFO: installed: tensorflow-estimator==2.9.0
INFO: installed: tensorboard-data-server==0.6.1
--- check: tensorboard_python_version
INFO: tensorboard.version.VERSION: '2.9.1'
--- check: tensorflow_python_version
INFO: tensorflow.__version__: '2.9.2'
INFO: tensorflow.__git_version__: 'v2.9.1-132-g18960c44ad3'
--- check: tensorboard_data_server_version
INFO: data server binary: '/home/ubuntu/.local/lib/python3.9/site-packages/tensorboard_data_server/bin/server'
INFO: data server binary version: b'rustboard 0.6.1'
--- check: tensorboard_binary_path
INFO: which tensorboard: b'/home/ubuntu/.local/bin/tensorboard\n'
--- check: addrinfos
socket.has_ipv6 = True
socket.AF_UNSPEC = <AddressFamily.AF_UNSPEC: 0>
socket.SOCK_STREAM = <SocketKind.SOCK_STREAM: 1>
socket.AI_ADDRCONFIG = <AddressInfo.AI_ADDRCONFIG: 32>
socket.AI_PASSIVE = <AddressInfo.AI_PASSIVE: 1>
Loopback flags: <AddressInfo.AI_ADDRCONFIG: 32>
Loopback infos: [(<AddressFamily.AF_INET6: 10>, <SocketKind.SOCK_STREAM: 1>, 6, '', ('::1', 0, 0, 0)), (<AddressFamily.AF_INET: 2>, <SocketKind.SOCK_STREAM: 1>, 6, '', ('127.0.0.1', 0))]
Wildcard flags: <AddressInfo.AI_PASSIVE: 1>
Wildcard infos: [(<AddressFamily.AF_INET: 2>, <SocketKind.SOCK_STREAM: 1>, 6, '', ('0.0.0.0', 0)), (<AddressFamily.AF_INET6: 10>, <SocketKind.SOCK_STREAM: 1>, 6, '', ('::', 0, 0, 0))]
--- check: readable_fqdn
INFO: socket.getfqdn(): 'ip-xxx-xx-xx-xxx.ec2.internal'
--- check: stat_tensorboardinfo
INFO: directory: /tmp/.tensorboard-info
INFO: os.stat(...): os.stat_result(st_mode=16895, st_ino=144084, st_dev=66306, st_nlink=2, st_uid=1000, st_gid=1000, st_size=4096, st_atime=1682631005, st_mtime=1682691735, st_ctime=1682691735)
INFO: mode: 0o40777
--- check: source_trees_without_genfiles
INFO: tensorboard_roots (1): ['/home/ubuntu/.local/lib/python3.9/site-packages']; bad_roots (0): []
--- check: full_pip_freeze
INFO: pip freeze --all:
absl-py==1.3.0
aiohttp==3.8.1
aiohttp-cors==0.7.0
aiosignal==1.3.1
alabaster==0.7.12
albumentations==1.2.0
alembic==1.10.3
antlr4-python3-runtime==4.9.3
anyio==3.6.2
argon2-cffi==21.3.0
argon2-cffi-bindings==21.2.0
arrow==1.2.3
asttokens==2.2.1
astunparse==1.6.3
async-generator==1.10
async-timeout==4.0.2
attrs==21.4.0
Automat==0.8.0
autopage==0.5.1
Babel==2.11.0
backcall==0.2.0
beautifulsoup4==4.11.1
black==22.12.0
bleach==5.0.1
blessed==1.20.0
blinker==1.4
bokeh==3.0.2
boto3==1.22.6
botocore==1.25.13
cachetools==5.2.0
certifi==2022.12.7
cffi==1.15.1
cfgv==3.3.1
chardet==3.0.4
charset-normalizer==2.1.1
click==8.1.3
cliff==4.2.0
cloud-init==23.1.2
cloudpickle==2.0.0
cmaes==0.9.1
cmd2==2.4.3
colorama==0.4.3
colorful==0.5.5
colorlog==6.7.0
comm==0.1.2
command-not-found==0.3
commonmark==0.9.1
configobj==5.0.6
constantly==15.1.0
contourpy==1.0.6
conversions==0.0.2
cryptography==2.8
curio==1.6
cycler==0.11.0
Cython==0.29.32
dbus-python==1.2.16
debugpy==1.6.4
decorator==5.1.1
defusedxml==0.7.1
dill==0.3.6
distinctipy==1.2.2
distlib==0.3.6
distro==1.4.0
distro-info===0.23ubuntu1
dm-tree==0.1.7
docker-pycreds==0.4.0
docrepr==0.2.0
docutils==0.17.1
ec2-hibinit-agent==1.0.0
edward2==0.0.2
einops==0.4.1
entrypoints==0.3
etils==0.9.0
exceptiongroup==1.0.4
executing==1.2.0
fastjsonschema==2.16.2
filelock==3.8.2
flatbuffers==1.12
focal-loss==0.0.7
fonttools==4.38.0
fqdn==1.5.1
frozenlist==1.3.3
fsspec==2022.11.0
gast==0.4.0
gin-config==0.5.0
gitdb==4.0.10
GitPython==3.1.29
google-api-core==2.11.0
google-api-python-client==2.69.0
google-auth==2.15.0
google-auth-httplib2==0.1.0
google-auth-oauthlib==0.4.6
google-pasta==0.2.0
googleapis-common-protos==1.57.0
gpustat==1.1
greenlet==2.0.2
grpcio==1.43.0
gviz-api==1.10.0
h5py==3.7.0
hibagent==1.0.1
html2text==2020.1.16
httplib2==0.21.0
hydra-colorlog==1.1.0
hydra-core==1.2.0
hydra-joblib-launcher==1.2.0
hydra-optuna-sweeper==1.2.0
hydra-ray-launcher==1.2.0
hyperlink==19.0.0
identify==2.5.9
idna==3.4
imageio==2.22.4
imagesize==1.4.1
importlib-metadata==4.13.0
importlib-resources==5.10.1
incremental==16.10.1
iniconfig==1.1.1
ipykernel==6.19.2
ipympl==0.9.2
ipyparallel==8.4.1
ipython==8.7.0
ipython-genutils==0.2.0
ipywidgets==8.0.4
isoduration==20.11.0
jedi==0.18.2
Jinja2==3.1.2
jmespath==1.0.1
joblib==1.1.1
jsonpatch==1.22
jsonpointer==2.0
jsonschema==4.17.3
jupyter-events==0.5.0
jupyter_client==7.4.8
jupyter_core==5.1.0
jupyter_server==2.0.5
jupyter_server_terminals==0.4.3
jupyterlab-pygments==0.2.2
jupyterlab-widgets==3.0.5
kaggle==1.5.12
keras==2.9.0
Keras-Preprocessing==1.1.2
keyring==18.0.1
kiwisolver==1.4.4
language-selector==0.1
launchpadlib==1.10.13
lazr.restfulclient==0.14.2
lazr.uri==1.0.3
libclang==14.0.6
llvmlite==0.39.1
lxml==4.9.1
Mako==1.2.4
Markdown==3.4.1
MarkupSafe==2.1.1
matplotlib==3.5.1
matplotlib-inline==0.1.6
mistune==2.0.4
more-itertools==4.2.0
msgpack==1.0.5
multidict==6.0.4
multiprocess==0.70.14
mypy-extensions==0.4.3
nbclassic==0.4.8
nbclient==0.7.2
nbconvert==7.2.7
nbformat==5.7.1
nest-asyncio==1.5.6
netifaces==0.10.4
networkx==2.8.8
nmslib==2.1.1
nodeenv==1.7.0
notebook==6.5.2
notebook_shim==0.2.2
numba==0.56.4
numpy==1.21.5
nvidia-ml-py==11.525.112
oauth2client==4.1.3
oauthlib==3.2.2
omegaconf==2.2.2
opencensus==0.11.2
opencensus-context==0.1.3
opencv-python==4.6.0.66
opencv-python-headless==4.6.0.66
opt-einsum==3.3.0
optuna==2.10.1
outcome==1.2.0
packaging==22.0
pandas==1.4.3
pandocfilters==1.5.0
parso==0.8.3
pathos==0.3.0
pathspec==0.10.3
pathtools==0.1.2
pbr==5.11.1
pexpect==4.6.0
pickle5==0.0.11
pickleshare==0.7.5
Pillow==9.3.0
pip==23.1.2
platformdirs==2.6.0
pluggy==1.0.0
portalocker==2.6.0
pox==0.3.2
ppft==1.7.6.6
pre-commit==2.20.0
prettytable==3.7.0
prometheus-client==0.13.1
promise==2.3
prompt-toolkit==3.0.36
protobuf==3.19.6
protobuf3-to-dict==0.1.5
psutil==5.9.4
ptyprocess==0.7.0
pure-eval==0.2.2
py==1.11.0
py-cpuinfo==9.0.0
py-spy==0.3.14
pyarrow==10.0.1
pyasn1==0.4.8
pyasn1-modules==0.2.8
pybind11==2.6.1
pycocotools==2.0.6
pycparser==2.21
pydash==5.1.2
Pygments==2.13.0
PyGObject==3.36.0
pygwalker==0.1.4
PyHamcrest==1.9.0
PyJWT==1.7.1
pymacaroons==0.13.0
PyNaCl==1.3.0
pynndescent==0.5.8
PyOpenGL==3.1.6
pyOpenSSL==19.0.0
pyparsing==3.0.9
pyperclip==1.8.2
PyQt5==5.14.1
PyQt6==6.4.0
PyQt6-Qt6==6.4.1
PyQt6-sip==13.4.0
pyqtgraph==0.13.1
pyrsistent==0.15.5
pyserial==3.4
pytest==6.2.5
pytest-asyncio==0.20.3
python-apt==2.0.0+ubuntu0.20.4.8
python-dateutil==2.8.2
python-debian===0.1.36ubuntu1
python-json-logger==2.0.4
python-slugify==7.0.0
python-version==0.0.2
pytz==2022.6
PyWavelets==1.4.1
PyYAML==5.4.1
pyzmq==24.0.1
qtconsole==5.4.0
QtPy==2.3.0
qudida==0.0.4
ray==1.12.0
recommonmark==0.7.1
regex==2022.10.31
requests==2.28.1
requests-oauthlib==1.3.1
requests-unixsocket==0.2.0
rfc3339-validator==0.1.4
rfc3986-validator==0.1.1
rich==12.6.0
rsa==4.9
s3fs==0.4.2
s3transfer==0.5.2
sacrebleu==2.3.1
sagemaker==2.109.0
scikit-image==0.18.3
scikit-learn==1.1.1
scipy==1.7.3
seaborn==0.12.1
SecretStorage==2.3.1
Send2Trash==1.8.0
sentencepiece==0.1.97
sentry-sdk==1.11.1
seqeval==1.2.2
service-identity==18.1.0
setproctitle==1.3.2
setuptools==61.2.0
shortuuid==1.0.11
simplejson==3.16.0
sip==4.19.21
six==1.16.0
smart-open==6.3.0
smdebug-rulesconfig==1.0.1
smmap==5.0.0
sniffio==1.3.0
snowballstemmer==2.2.0
sortedcontainers==2.4.0
sos==4.4
soupsieve==2.3.2.post1
Sphinx==5.3.0
sphinx-markdown-builder==0.5.5
sphinx-rtd-theme==1.1.1
sphinxcontrib-applehelp==1.0.2
sphinxcontrib-devhelp==1.0.2
sphinxcontrib-htmlhelp==2.0.0
sphinxcontrib-jsmath==1.0.1
sphinxcontrib-qthelp==1.0.3
sphinxcontrib-serializinghtml==1.1.5
SQLAlchemy==2.0.9
ssh-import-id==5.10
stack-data==0.6.2
stevedore==5.0.0
systemd-python==234
tabulate==0.9.0
tensorboard==2.9.1
tensorboard-data-server==0.6.1
tensorboard-plugin-profile==2.11.1
tensorboard-plugin-wit==1.8.1
tensorflow==2.9.2
tensorflow-addons==0.17.1
tensorflow-datasets==4.7.0
tensorflow-decision-forests==0.2.7
tensorflow-estimator==2.9.0
tensorflow-hub==0.12.0
tensorflow-io==0.26.0
tensorflow-io-gcs-filesystem==0.26.0
tensorflow-metadata==1.12.0
tensorflow-model-optimization==0.7.3
tensorflow-probability==0.17.0
tensorflow-similarity==0.16.8
tensorflow-text==2.9.0
termcolor==2.1.1
terminado==0.17.1
testpath==0.6.0
text-unidecode==1.3
tf-models-official==2.9.2
tf-slim==1.1.0
threadpoolctl==3.1.0
tifffile==2022.10.10
tinycss2==1.2.1
toml==0.10.2
tomli==2.0.1
tornado==6.2
tqdm==4.64.1
traitlets==5.7.0
trio==0.22.0
Twisted==18.9.0
typeguard==2.13.3
typing_extensions==4.4.0
ubuntu-advantage-tools==27.12
ufw==0.36
umap-learn==0.5.3
unattended-upgrades==0.1
unify==0.5
untokenize==0.1.1
uri-template==1.2.0
uritemplate==4.1.1
urllib3==1.26.13
validators==0.20.0
virtualenv==20.17.1
vit-keras==0.1.0
wadllib==1.3.3
wandb==0.12.18
wcwidth==0.2.5
webcolors==1.12
webencodings==0.5.1
websocket-client==1.4.2
Werkzeug==2.2.2
wheel==0.38.4
widgetsnbextension==4.0.5
wrapt==1.14.1
wurlitzer==3.0.3
xyzservices==2022.9.0
yapf==0.32.0
yarl==1.8.2
zipp==3.11.0
zope.interface==4.7.1
As expected the problem is with tensorflow_io not being used. I propose a few solutions:
- Imports
In
backend/event_processing/io_wrapper.py
:
import tensorflow as tf
import tensorflow_io as tfio
import s3fs
Note the import of s3fs
- this is because tf.io.gfile.glob
is VERY slow for recursing through an aws s3 path.
- Walk through s3 path:
def S3ListRecursivelyViaWalking(top):
s3 = s3fs.S3FileSystem()
for dir_path, _, filenames in s3.walk(top, topdown=True, refresh=True):
yield (
"s3://" + dir_path,
(os.path.join("s3://" + dir_path, filename) for filename in filenames),
)
- Use above method to index s3 path:
if io_util.IsCloudPath(path):
# Glob-ing for files can be significantly faster than recursively
# walking through directories for some file systems.
logger.info(
"GetLogdirSubdirectories: Starting to list directories via glob-ing."
)
if io_util.IsS3Path(path):
traversal_method = S3ListRecursivelyViaWalking
else:
traversal_method = ListRecursivelyViaGlobbing
- Add
io_util.IsS3Path
function inutil/io_util.py
:
def IsS3Path(path):
return path.startswith("s3://")
Thoughts?
Hi @Krasner,
We added S3 support in https://github.com/tensorflow/tensorboard/pull/5491 (since TensorBoard v2.6). If the S3 directory parsing failed due to tensorflow-io
not found, the error message would be something like Error: Unsupported filename scheme S3...
(e.g. https://github.com/tensorflow/tensorboard/issues/5480), and it will prompt you to install TF I/O. I can see that TF I/O dependency exists in your environment from the diagnostics output, so I'm not sure if this is an issue with identifying and parsing S3 files.
The error messages Error message: No response body
and If the signature check failed. This could be because of a time skew. Attempting to adjust the signer
look like permission or configuration issue related to S3. I'm not familiar with AWS, is it possible to adjust the AWS_LOG_LEVEL
(or maybe there is another arg) to get more information about the failure?
@yatbear I don't think it's a permission issue - as I noted above, I can access aws s3
from my ec2 instance, and if I import tensorflow_io
in my script then I am also able to access aws files with tf.io.gfile
. However without the explicit import of this library tf.io.gfile
will fail.
Interestingly, after the fixes above the error messages are still visible with AWS_LOG_LEVEL=1
but tensorboard is able to access event files on s3.
Additionally, as I mentioned tf.io.gfile
is very slow compared to s3fs
for accessing s3 files.
@Krasner, thanks for the clarification and the proposed solutions above! I just saw this open issue under tensorflow-io repo: https://github.com/tensorflow/io/issues/1731, which suggests the problem lies here. A temporary workaround mentioned in https://github.com/tensorflow/io/issues/1731#issuecomment-1332779337 is to pin tensorflow-io
dependency to 0.27.0
, could you try this? In the meantime, I will do a bit more investigation before adding the new dependency s3fs
.
I saw this recent fix related S3: https://github.com/tensorflow/io/pull/1790, but it is not included to the latest tensorflow-io pip version: https://pypi.org/project/tensorflow-io/#history, and their nightly is also stale, left a comment under the aforementioned PR.
I am get the same error when using tensorboard --logdir s3://zenml-minio-store/logs/... I used version as below; tensorflow=2.8.0, tensorboard=2.8.0, tensorflow-io=0.24.0 I have tried to update to tensorflow=2.12.0, tensorboard=2.12.3, tensorflow-io= 0.33.0, but i doesn't work