fiftyone
fiftyone copied to clipboard
[BUG] Errror: I/O operation on closed file with progress bars enabled
Instructions
Thank you for submitting an issue. Please refer to our issue policy for information on what types of issues we address.
- Please fill in this template to ensure a timely and thorough response
- Place an "x" between the brackets next to an option if it applies. For example:
- [x] Selected option
- Please delete everything above this line before submitting the issue
System information
- OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Ubuntu 22.04
- Python version (
python --version): 3.11.4 - FiftyOne version (
fiftyone --version): 0.21.6 - FiftyOne installed from (pip or source): pip
Commands to reproduce
# commands here
Describe the problem
Describe the problem clearly here. Include descriptions of the expected behavior and the actual behavior.
Code to reproduce issue
filepath = '/path/to/image.jpg'
samples = [fo.Sample(filepath=filepath)
dataset.add_samples(samples) # Success
samples = [fo.Sample(filepath=filepath)
dataset.add_samples(samples) # Raised an Error
Other info/logs
Traceback (most recent call last):
File "/usr/local/lib/python3.11/site-packages/flask/app.py", line 1820, in full_dispatch_request
rv = self.dispatch_request()
^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/flask/app.py", line 1796, in dispatch_request
return self.ensure_sync(self.view_functions[rule.endpoint])(**view_args)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/daat-1.0.0-py3.11.egg/daat/app/routes/datasets.py", line 163, in load_from_folder
DAATDataset.load_data_from_folder(dataset_name, folder_name)
File "/usr/local/lib/python3.11/site-packages/daat-1.0.0-py3.11.egg/daat/fiftyone_utils/utils.py", line 131, in load_data_from_folder
cls.load_data_from_mnt(dataset_name, dir_path=folder.path)
File "/usr/local/lib/python3.11/site-packages/daat-1.0.0-py3.11.egg/daat/fiftyone_utils/utils.py", line 125, in load_data_from_mnt
dataset.add_samples(samples)
File "/usr/local/lib/python3.11/site-packages/fiftyone/core/dataset.py", line 2480, in add_samples
with batcher:
File "/usr/local/lib/python3.11/site-packages/fiftyone/core/utils.py", line 1090, in __exit__
self._pb.__exit__(*args)
File "/usr/local/lib/python3.11/site-packages/eta/core/utils.py", line 1673, in __exit__
self.close(*args)
File "/usr/local/lib/python3.11/site-packages/eta/core/utils.py", line 1853, in close
self._draw(force=True, last=True)
File "/usr/local/lib/python3.11/site-packages/eta/core/utils.py", line 1973, in _draw
self.pause()
File "/usr/local/lib/python3.11/site-packages/eta/core/utils.py", line 1921, in pause
sys.stdout.write("\r" + " " * self._max_len + "\r")
ValueError: I/O operation on closed file
What areas of FiftyOne does this bug affect?
- [ ]
App: FiftyOne application issue - [x]
Core: Core Python library issue - [ ]
Server: FiftyOne server issue
Willingness to contribute
The FiftyOne Community encourages bug fix contributions. Would you or another member of your organization be willing to contribute a fix for this bug to the FiftyOne codebase?
- [x] Yes. I can contribute a fix for this bug independently
- [x] Yes. I would be willing to contribute a fix for this bug with guidance from the FiftyOne community
- [ ] No. I cannot contribute a bug fix at this time
Hi @stevezkw1998. This doesn't look related to add_samples() but rather an issue with running FiftyOne inside your Flask app. Perhaps setting FIFTYONE_SHOW_PROGRESS_BARS=false to disabled progress bar IO will solve the issue
Hi @benjaminpkane Thank you for your tips
I thought it should be eta lib issues, and it seems that add_samples() has long time not to be maintainced.
Can the method merge_samples completely cover all functionalities of add_samples() ?
There is no issue with add_samples() at the moment. merge_samples() could be used in place of add_samples(), but merge_samples() uses a merge key (by default filepath), so behavior will be different.
import fiftyone as fo
filepath = 'image.jpg'
samples = [fo.Sample(filepath=filepath)]
add = fo.Dataset("add")
add.add_samples(samples)
add.add_samples(samples)
merge = fo.Dataset("merge")
merge.merge_samples(samples)
merge.merge_samples(samples)
print(len(add), len(merge))
Out:
2 1
The code snippet you shared has no issues with it. Adding a sample to a dataset twice will result in two new samples.
The stacktrace you shared shows an issue with stdio and the progress bar in your flask server. My recommendation is to turn off the progress bar with FIFTYONE_SHOW_PROGRESS_BARS=false. That is the only issue I see with your code.
Hi @benjaminpkane thank you for providing easy-to-understand guidance
I will try using FIFTYONE_SHOW_PROGRESS_BARS=false as an workaround, but I still think this might be the issue with the lib eta
By the way I will complete this issue if FIFTYONE_SHOW_PROGRESS_BARS=false works, and thank you for unblocking me.
I will dive deep into 'eta' and keep you updated if I find the right way to solve this issue fundamentally.
I get this I/O operation on closed file all the time because of fiftyone. I kept isolating it away and finally can't see any other code or packages left that could be doing it.
This ruins the entire process and all fail with this error.
I think fiftyone must be wrapping stdout and causing issues. This means that fiftyone is not thread safe.
But it could be just a package via lib that does this that for me only fiftyone uses.
I checked all python level code and find no code left that could cause it (wrapping of stdout/stderr).
My view is probably eta is not the problem, it's just symptom of the lack of thread safety in something else that wraps stdout/stdin. This very well could be the progress bars, but doesn't seem to be in python code.
Anything like this is bad:
stdin = sys.stdin
stdout = sys.stdout
sys.stdin = open(os.devnull)
sys.stdout = open(os.devnull, "w")
or:
@contextlib.contextmanager
def mute_stdout():
stdout = sys.stdout
sys.stdout = open(os.devnull, "w")
try:
yield
finally:
sys.stdout = stdout
or:
class GrabStdout:
def __init__(self):
self.sys_stdout = sys.stdout
self.data = ''
sys.stdout = self
def write (self, data):
self.sys_stdout.write(data)
self.data += data
def flush (self):
self.sys_stdout.flush()
def restore(self):
sys.stdout = self.sys_stdout
or:
global wrapped_stdout, wrapped_stderr
global orig_stdout, orig_stderr
orig_stdout = sys.stdout
orig_stderr = sys.stderr
if sys.stdout is None:
wrapped_stdout = None
else:
sys.stdout = wrapped_stdout = \
wrap_stream(orig_stdout, convert, strip, autoreset, wrap)
if sys.stderr is None:
wrapped_stderr = None
else:
sys.stderr = wrapped_stderr = \
wrap_stream(orig_stderr, convert, strip, autoreset, wrap)
or
def _enable_redirect_io(self) -> None:
"""Enable redirecting of stdout / stderr."""
if self.console.is_terminal or self.console.is_jupyter:
if self._redirect_stdout and not isinstance(sys.stdout, FileProxy):
self._restore_stdout = sys.stdout
sys.stdout = cast("TextIO", FileProxy(self.console, sys.stdout))
if self._redirect_stderr and not isinstance(sys.stderr, FileProxy):
self._restore_stderr = sys.stderr
sys.stderr = cast("TextIO", FileProxy(self.console, sys.stderr))
def _disable_redirect_io(self) -> None:
"""Disable redirecting of stdout / stderr."""
if self._restore_stdout:
sys.stdout = cast("TextIO", self._restore_stdout)
self._restore_stdout = None
if self._restore_stderr:
sys.stderr = cast("TextIO", self._restore_stderr)
self._restore_stderr = None
All these are not thread safe and used by anyio, numpy, colorama, rich, etc. and should be avoided.
For the venv setup, I see this done by anyio
Just to reiterate, setting:
export FIFTYONE_SHOW_PROGRESS_BARS=false
will disable all progress bars in FiftyOne and thus prevent any stdout capture from happening. If you're operating in a concurrent environment, chances are you don't want/care about progress bars anyway, so that seems appropriate.
As for the I/O error that occurring:
- Here's where our progress bar class is implemented: https://github.com/voxel51/eta/blob/57a39e032cbb25781c9bece14e31a8348399e8ad/eta/core/utils.py#L1490
- Here's the utility class that handles stdout capture: https://github.com/voxel51/eta/blob/57a39e032cbb25781c9bece14e31a8348399e8ad/eta/core/utils.py#L1355
@pseudotensor seems like you have some expertise here, any chance you can spot a fix to make these classes thread safe?