tensorboard
tensorboard copied to clipboard
botocore.errorfactory.NoSuchKey when old TF Events got deleted
Consider Stack Overflow for getting support using TensorBoard—they have a larger community with better searchability:
https://stackoverflow.com/questions/tagged/tensorboard
Do not use this template for for setup, installation, or configuration issues. Instead, use the “installation problem” issue template:
https://github.com/tensorflow/tensorboard/issues/new?template=installation_problem.md
To report a problem with TensorBoard itself, please fill out the remainder of this template.
Environment information (required)
Please run diagnose_tensorboard.py
(link below) in the same
environment from which you normally run TensorFlow/TensorBoard, and
paste the output here:
tensorboard==2.9.1
https://raw.githubusercontent.com/tensorflow/tensorboard/master/tensorboard/tools/diagnose_tensorboard.py
For browser-related issues, please additionally specify:
- Browser type and version (e.g., Chrome 64.0.3282.140):
- Screenshot, if it’s a visual issue:
Issue description
Please describe the bug as clearly as possible. How can we reproduce the problem without additional resources (including external data files and proprietary Python modules)?
When use Tensorboard to read TFEvents from S3, the deleted TFEvents from the same logdir will trigger event_file_loader exceptions as following:
Exception in thread Reloader 15:
Traceback (most recent call last):
File "/usr/lib/python3.10/threading.py", line 1016, in _bootstrap_inner
self.run()
File "/usr/lib/python3.10/threading.py", line 953, in run
self._target(*self._args, **self._kwargs)
File "/usr/local/lib/python3.10/dist-packages/tensorboard/backend/event_processing/plugin_event_multiplexer.py", line 239, in Worker
accumulator.Reload()
File "/usr/local/lib/python3.10/dist-packages/tensorboard/backend/event_processing/plugin_event_accumulator.py", line 183, in Reload
for event in self._generator.Load():
File "/usr/local/lib/python3.10/dist-packages/tensorboard/backend/event_processing/directory_watcher.py", line 88, in Load
for event in self._LoadInternal():
File "/usr/local/lib/python3.10/dist-packages/tensorboard/backend/event_processing/directory_watcher.py", line 118, in _LoadInternal
for event in self._loader.Load():
File "/usr/local/lib/python3.10/dist-packages/tensorboard/backend/event_processing/event_file_loader.py", line 270, in Load
for event in super(EventFileLoader, self).Load():
File "/usr/local/lib/python3.10/dist-packages/tensorboard/backend/event_processing/event_file_loader.py", line 244, in Load
for record in super(LegacyEventFileLoader, self).Load():
File "/usr/local/lib/python3.10/dist-packages/tensorboard/backend/event_processing/event_file_loader.py", line 178, in Load
yield next(self._iterator)
File "/usr/local/lib/python3.10/dist-packages/tensorboard/backend/event_processing/event_file_loader.py", line 109, in __next__
self._reader.GetNext()
File "/usr/local/lib/python3.10/dist-packages/tensorboard/compat/tensorflow_stub/pywrap_tensorflow.py", line 207, in GetNext
header_str = self._read(8)
File "/usr/local/lib/python3.10/dist-packages/tensorboard/compat/tensorflow_stub/pywrap_tensorflow.py", line 273, in _read
new_data = self.file_handle.read(n)
File "/usr/local/lib/python3.10/dist-packages/tensorboard/compat/tensorflow_stub/io/gfile.py", line 727, in read
(self.buff, self.continuation_token) = self.fs.read(
File "/usr/local/lib/python3.10/dist-packages/tensorboard/compat/tensorflow_stub/io/gfile.py", line 287, in read
stream = s3.Object(bucket, path).get(**args)["Body"].read()
File "/usr/local/lib/python3.10/dist-packages/boto3/resources/factory.py", line 520, in do_action
response = action(self, *args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/boto3/resources/action.py", line 83, in __call__
response = getattr(parent.meta.client, operation_name)(*args, **params)
File "/usr/local/lib/python3.10/dist-packages/botocore/client.py", line 391, in _api_call
return self._make_api_call(operation_name, kwargs)
File "/usr/local/lib/python3.10/dist-packages/botocore/client.py", line 719, in _make_api_call
raise error_class(parsed_response, operation_name)
botocore.errorfactory.NoSuchKey: An error occurred (NoSuchKey) when calling the GetObject operation: The specified key does not exist.
This exception will block any new event been processed and similar issue is: https://github.com/tensorflow/tensorboard/issues/2634
To clarify, is the issue precisely the same as #2634? i.e. deleted events cause a crash instead of being ignored or handled gracefully somehow? But this is particularly how this issue manifests with the S2 filesystem?
Just to set expectations, support for S3 filesystem is best-effort, so I doubt we'll prioritize this, but I'll check with the team.
Ah, and can you clarify if this is also when TensorFlow is not installed, like in #2634? Does installing TensorFlow work around the issue?
To clarify, is the issue precisely the same as https://github.com/tensorflow/tensorboard/issues/2634? i.e. deleted events cause a crash instead of being ignored or handled gracefully somehow? But this is particularly how this issue manifests with the S2 filesystem?
Yes, this is the exact issue that also occur to S3 file system.
Ah, and can you clarify if this is also when TensorFlow is not installed, like in https://github.com/tensorflow/tensorboard/issues/2634? Does installing TensorFlow work around the issue?
No native TF installed in this case and TensorBoard is using the stub version for I/O operations. Let me try it out with compatible TF installed. Thanks for the suggestions!