audioread icon indicating copy to clipboard operation
audioread copied to clipboard

stream not properly closed (NoBackendError | OSError: Too many open files)

Open ne555 opened this issue 2 years ago • 4 comments

import audioread

filename = 'some_audio_file.ogg'
try:
    for K in range(2048):
        with audioread.audio_open(filename) as audio:
            print(K, audio.duration)
except audioread.exceptions.NoBackendError:
    with open(filename, 'rb') as file: # OSError: [Errno 24] Too many open files
        pass

the audio is openned in a loop using with, but it seems that is not properly closed in my system it will print about 500 lines, then raises the NoBackendError and the OSError when trying to open a new file

using lsof shows lines of the form (my limit is 1024)

COMMAND   PID    USER   FD      TYPE             DEVICE SIZE/OFF     NODE NAME
python  38124 ne555 1023u     unix 0x00000000d813f451      0t0  2420780 type=STREAM (CONNECTED)

I've observed this issue with audioread.gstdec.GstAudioFile and audioread.ffdec.FFmpegAudioFile backends, couldn't test with audioread.rawread.RawAudioFile

ne555 avatar May 23 '23 11:05 ne555

Thanks for the complete script for testing this! I wasn't able to reproduce this with some quick testing (macOS, FFmpeg backend, using lsof to check if things went out of control). I don't have access to something using Gstreamer at the moment, but I would very much believe that that backend could have some kind of a leak.

For FFmpeg in particular, is there any chance you can check whether the loopy script also leaves a bunch of ffmpeg processes running? Like, I can imagine ps | grep ffmpeg showing 1024 processes if we're not correctly cleaning things up there.

sampsyo avatar May 24 '23 17:05 sampsyo

good morning, I didn't realise that before trying to open the file with FFmpegAudioFile it was using GstAudioFile when limited the backends to only FFmpegAudioFile it worked fine.

so the issue seems to be only with the GstAudioFile backend (either if it opens or not the file correctly)

ne555 avatar May 30 '23 12:05 ne555

Ah, that makes sense! I unfortunately don't have a great way to test this out here… I can imagine that we may need to do some additional explicit resource cleanup here: https://github.com/beetbox/audioread/blob/ff9535df934c48038af7be9617fdebb12078cc07/audioread/gstdec.py#L378-L406

But it will require some real GStreamer expertise or trial and error to figure out exactly what to clean up.

sampsyo avatar May 30 '23 20:05 sampsyo

I have run into the same problem while using librosa.load. Passing a file descriptor instead of the path helped:

with open(path, "rb") as fp:
    librosa.load(fp)

nikita-petrashen avatar Jul 22 '24 18:07 nikita-petrashen