io
io copied to clipboard
tfio.audio.AudioIOTensor is not thread safe
tfio.audio.AudioIOTensor - likely the underlying audio libraries - are not thread safe.
Here is a simple example:
def read_audio_file(filename: str, dtype: str):
audio = tfio.audio.AudioIOTensor(filename, dtype=dtype)
return audio._resource, audio.dtype, audio.rate
from pathlib import Path
filesnames = Path('somedirectory').glob('*.wav')
filenames = [str(x) for x in filenames]
ds = tf.data.Dataset.from_tensor_slices(filenames)
ds = ds.map(lambda x: read_audio_file(x, dtype='int16'), num_parallel_calls=4)
# only include samples that match our desired sample rate
ds = ds.filter(lambda x,y,z: z==44100)
Assuming the directory has files with different sample rates, this is the quickest way to show the problem
assert len(list(ds)) == '#44.1k files in that direcotry'
Every invocation of this code will likely lead to a different number of files. I don't understand the underlying code well enough but I suspect that the external libraries for reading wav files may have thread safety issues.
Any update on this?