io
io copied to clipboard
wav_io needs to accept wav files which has 'JUNK' chunk before 'fmt ' chunk
This issue is from https://github.com/tensorflow/tensorflow/issues/26247#issuecomment-900226218 : (cc @MemoonaTahira)
I am using transfer learning for audio using this tutorial and I have a few wav files with 'BEXT' chunk and it throws an error.
2021-08-17 16:13:21.912804: W tensorflow/core/framework/op_kernel.cc:1692] OP_REQUIRES failed at audio_video_wav_kernels.cc:315 : Out of range: EOF reached 2021-08-17 16:13:21.916281: W tensorflow/core/framework/op_kernel.cc:1692] OP_REQUIRES failed at decode_wav_op.cc:55 : Invalid argument: Data too short when trying to read string Traceback (most recent call last): File "C:\Users\Mona\anaconda3\envs\lisnen_work\lib\contextlib.py", line 135, in exit self.gen.throw(type, value, traceback) File "C:\Users\Mona\anaconda3\envs\lisnen_work\lib\site-packages\tensorflow\python\ops\variable_scope.py", line 2833, in variable_creator_scope yield File "C:\Users\Mona\anaconda3\envs\lisnen_work\lib\site-packages\keras\engine\training.py", line 1184, in fit tmp_logs = self.train_function(iterator) File "C:\Users\Mona\anaconda3\envs\lisnen_work\lib\site-packages\tensorflow\python\eager\def_function.py", line 885, in call result = self._call(*args, **kwds) File "C:\Users\Mona\anaconda3\envs\lisnen_work\lib\site-packages\tensorflow\python\eager\def_function.py", line 917, in _call return self._stateless_fn(*args, **kwds) # pylint: disable=not-callable File "C:\Users\Mona\anaconda3\envs\lisnen_work\lib\site-packages\tensorflow\python\eager\function.py", line 3039, in call return graph_function._call_flat( File "C:\Users\Mona\anaconda3\envs\lisnen_work\lib\site-packages\tensorflow\python\eager\function.py", line 1963, in _call_flat return self._build_call_outputs(self._inference_function.call( File "C:\Users\Mona\anaconda3\envs\lisnen_work\lib\site-packages\tensorflow\python\eager\function.py", line 591, in call outputs = execute.execute( File "C:\Users\Mona\anaconda3\envs\lisnen_work\lib\site-packages\tensorflow\python\eager\execute.py", line 59, in quick_execute tensors = pywrap_tfe.TFE_Py_Execute(ctx._handle, device_name, op_name, tensorflow.python.framework.errors_impl.InvalidArgumentError: 2 root error(s) found. (0) Invalid argument: Function invocation produced OutOfRangeError: EOF reached [[{{node PartitionedCall/IO>AudioDecodeWAV}}]] [[IteratorGetNext]] (1) Invalid argument: Function invocation produced OutOfRangeError: EOF reached [[{{node PartitionedCall/IO>AudioDecodeWAV}}]] [[IteratorGetNext]] [[IteratorGetNext/_2]] 0 successful operations. 0 derived errors ignored. [Op:__inference_train_function_31415]
Function call stack: train_function -> train_function
2021-08-17 16:13:29.065415: W tensorflow/core/framework/op_kernel.cc:1692] OP_REQUIRES failed at decode_wav_op.cc:55 : Invalid argument: Header mismatch: Expected fmt but found bext
(I am using tensorflow-gpu 2.6 and I tried first with just tfio 0.20.0, and then after installing tensorflow-io-nightly 0.20.0.dev20210815170710)
I have recreated the function load_16k_mono() as discussed here on this thread with tfio.audio.decode_wav
instead of tf.audio.decode_wav
Both functions load_16k_mono() and load_16k_mono_modified are showing the exact same output in the debugger. However, when I use audio files processed through this function for training, I still get the same error.
Here is the full code:
import tensorflow as tf
import tensorflow_io as tfio
@tf.function
def load_wav_16k_mono(filename):
""" Load a WAV file, convert it to a float tensor, resample to 16 kHz single-channel audio. """
file_contents = tf.io.read_file(filename)
wav, sample_rate = tf.audio.decode_wav(
file_contents,
desired_channels=1)
wav = tf.squeeze(wav, axis=-1)
sample_rate = tf.cast(sample_rate, dtype=tf.int64)
wav = tfio.audio.resample(wav, rate_in=sample_rate, rate_out=16000)
return wav
@tf.function
def load_wav_16k_mono_modified(filename):
file_contents = tf.io.read_file(filename)
wav = tfio.audio.decode_wav(file_contents, dtype=tf.int16)
wav = wav[:, 0]
wav = tf.cast(wav, tf.float32)
_, sample_rate = tf.audio.decode_wav(file_contents, desired_channels=1)
sample_rate = tf.cast(sample_rate, dtype=tf.int64)
wav = tfio.audio.resample(wav, rate_in=sample_rate, rate_out=16000)
return wav
testing_wav_file_name = tf.keras.utils.get_file('miaow_16k.wav',
'https://storage.googleapis.com/audioset/miaow_16k.wav',
cache_dir='./',
cache_subdir='test_data')
load_wav_16k_mono (testing_wav_file_name)
load_wav_16k_mono_modified (testing_wav_file_name)
I also saw this code in one of the issues, and it is handling bext chunks, but probably not tfio.audio.wave_decode?
Thank you for opening the issue here. Following.
Update:
in the load_wav_16k_mono_modified() function, the decoded wav isn't normalized unlike tf.audio.decode_wav
that normalizes by default. This works:
wav = tf.cast(wav, tf.float32)/ 32768.0
but the bext chunk issue is still there.
@MemoonaTahira Are you encountering the issue on Windows only? I tried but didn't encounter the same issue on Linux.
Yes I am using Windows.
Python wave also seems to skip these chunks. Actually this is what I am using to filter out the files:
with wave.open(filename, 'r') as fin:
header_fsize = (fin.getnframes() * fin.getnchannels() * fin.getsampwidth()) + 44
file_fsize = os.path.getsize(filename)
if header_fsize != file_fsize:
print("Found a file with extra chunks: ", filename)
print(header_fsize, file_fsize)
Next, I did a quick fix by opening and rewriting the files filtered out by the above code with librosa. Do you think this is a Windows specific problem?
Edit: Since wave only gets the size of data and adds 44 for the fmt chunk, this only detects extra chunks in the header by comparing this value to the actual filesize. It doesn't fix the order of chunks. And I suspect librosa only re-arranges chunks so that fmt comes before bext or junk.
Did the file you use to check this have its bext chunk before or after the fmt chunk?
Sorry for my 2-and-a-half-year late comment(I was the person who opened https://github.com/tensorflow/tensorflow/issues/26247 but left silently). Using @MemoonaTahira 's code with @carlthome 's example wav file( https://github.com/tensorflow/tensorflow/issues/26247#issuecomment-547475943 ), I'm still facing the similar issue on my Linux box.
2021-11-02 00:15:56.116943: W tensorflow/core/framework/op_kernel.cc:1692] OP_REQUIRES failed at decode_wav_op.cc:55 : Invalid argument: Header mismatch: Expected fmt but found JUNK
2021-11-02 00:15:56.118774: W tensorflow/core/framework/op_kernel.cc:1692] OP_REQUIRES failed at decode_wav_op.cc:55 : Invalid argument: Header mismatch: Expected fmt but found JUNK
Traceback (most recent call last):
File "a.py", line 38, in <module>
load_wav_16k_mono (testing_wav_file_name)
File "/work/venv3.7/gpu/lib/python3.7/site-packages/tensorflow/python/eager/def_function.py", line 885, in __call__
result = self._call(*args, **kwds)
File "/work/venv3.7/gpu/lib/python3.7/site-packages/tensorflow/python/eager/def_function.py", line 957, in _call
filtered_flat_args, self._concrete_stateful_fn.captured_inputs) # pylint: disable=protected-access
File "/work/venv3.7/gpu/lib/python3.7/site-packages/tensorflow/python/eager/function.py", line 1964, in _call_flat
ctx, args, cancellation_manager=cancellation_manager))
File "/work/venv3.7/gpu/lib/python3.7/site-packages/tensorflow/python/eager/function.py", line 596, in call
ctx=ctx)
File "/work/venv3.7/gpu/lib/python3.7/site-packages/tensorflow/python/eager/execute.py", line 60, in quick_execute
inputs, attrs, num_outputs)
tensorflow.python.framework.errors_impl.InvalidArgumentError: 2 root error(s) found.
(0) Invalid argument: Header mismatch: Expected fmt but found JUNK
[[node DecodeWav (defined at a.py:9) ]]
(1) Invalid argument: Header mismatch: Expected fmt but found JUNK
[[node DecodeWav (defined at a.py:9) ]]
[[DecodeWav/_12]]
0 successful operations.
0 derived errors ignored. [Op:__inference_load_wav_16k_mono_195]
Errors may have originated from an input operation.
Input Source operations connected to node DecodeWav:
ReadFile (defined at a.py:8)
Input Source operations connected to node DecodeWav:
ReadFile (defined at a.py:8)
Function call stack:
load_wav_16k_mono -> load_wav_16k_mono
Part of the pip list
here:
tensorflow-estimator 2.6.0
tensorflow-gpu 2.6.0
tensorflow-io 0.21.0
tensorflow-io-gcs-filesystem 0.21.0
Just in case - I commented out tf.keras.utils.get_file()
and instead just put the path to the wav file which I downloaded from @carlthome 's comment. (If you unzip that example.zip
attached on that comment you would get example.wav
)
32,37c32,36
< #testing_wav_file_name = tf.keras.utils.get_file('miaow_16k.wav',
< # 'https://storage.googleapis.com/audioset/miaow_16k.wav',
< # cache_dir='./',
< # cache_subdir='test_data')
< #
< testing_wav_file_name = 'example.wav'
---
> testing_wav_file_name = tf.keras.utils.get_file('miaow_16k.wav',
> 'https://storage.googleapis.com/audioset/miaow_16k.wav',
> cache_dir='./',
> cache_subdir='test_data')
>
Hello, I would recommend using librosa and soundfile to rewrite the extra chunks out of the file first. This is the only quickfix that worked if you are planning to use tf.audio which is used in the official tensorflow audio tutorial here.
I was hoping for input from @yongtang but this is what I understand is causing the problem. (Adding details again for any newcomers):
The wave library expects FMT chunk to be the starting chunk in the audio. So, if you use the tfio.audio (first function), it will work if any JUNK or BEXT chunk is after the FMT chunk. It will skip it. But if your data chunks are arranged in a way that BEXT or JUNK chunks come BEFORE FMT, even the second code does not work. The wave file is still valid as long a FMT comes before DATA chunk regardless of other chunks.
If it helps, please do this: it will write out all the extra chunks right out of your audio files and only leave FMT and DATA:
import wave
import librosa
import soundfile as sf
def check_chunks(DIR, CORRUPT_DIR):
files_list = glob.glob(DIR + '/**/*.wav', recursive=True)
i = 0
for filename in files_list:
try:
# get size of data chunk and add size of FMT header = 44
with wave.open(filename, 'r') as fin:
header_fsize = (fin.getnframes() * fin.getnchannels() * fin.getsampwidth()) + 44
# get actual filesize
file_fsize = os.path.getsize(filename)
# compare both sizes
if header_fsize != file_fsize:
print("Found a file with extra chunks: ", filename)
print(header_fsize, file_fsize)
i = i + 1
new_full_filename = os.path.splitext(filename) ## just separate out wav from name
new_filename = new_full_filename[0] + "_processed" + ".wav"
audio_in, sr = librosa.load(filename, sr=16000)
sf.write(new_filename, audio_in, 16000, subtype='PCM_16')
print('File fixed and saved as: ', new_filename)
# try to fix file and save with suffix "processed" and move original file to another folder
try:
FIXED_CHUNK_DIR = os.path.join(CORRUPT_DIR, "fixed_chunks")
if not os.path.isdir(FIXED_CHUNK_DIR):
os.mkdir(FIXED_CHUNK_DIR)
shutil.move(filename, FIXED_CHUNK_DIR)
print('Moving original file to: ', FIXED_CHUNK_DIR)
# sometimes, the original file fails to move, but the processed file is successfully created
except:
print("Original file cannot be moved, please move manually")
continue
# in case the file isn't fixed:
except:
# trying to at least move it out of the correct sound file folder
try:
CORRUPT_CHUNK_DIR = os.path.join(CORRUPT_DIR, "corrupt_chunks")
if not os.path.isdir(CORRUPT_CHUNK_DIR):
os.mkdir(CORRUPT_CHUNK_DIR)
print("This file cannot be fixed: ", filename, "\nMoving it to ", CORRUPT_CHUNK_DIR)
shutil.move(filename, CORRUPT_CHUNK_DIR)
# os.remove(filename)
# if the unfixed file refuses to move
except:
print("This file cannot be fixed: ", filename, "\nPlease remove manually")
continue
continue
print("done")
return
Thank you @MemoonaTahira for sharing your workarounds. After digging into several pull requests and commits, I think this could be solved when this commit is being released in tensorflow
, not tensorflow_io
:
https://github.com/tensorflow/tensorflow/commit/dd5a59b7ada3afb69a103ec52b9ef92f1362c42e
tfio.audio.decode_wav()
was already well fixed by #594 as you already mentioned, but tensorflow
's tf.audio.decode_wav()
was not fixed, and the tutorial was depending on it... So this was not about OS, this was not about tfio.audio
either, but this was about tf.audio
.
With tensorflow==2.7.0
release with tensorflow-io==0.22.0
now I'm facing Bad format chunk size for WAV: Expected 16 or 18, but got40
when using the setup given in my previous comment( https://github.com/tensorflow/io/issues/1503#issuecomment-956328006 ).
Posted by @foxik in https://github.com/tensorflow/tensorflow/issues/36823#issuecomment-587190239 it is very unfortunate but the specific wav file with WAVE_FORMAT_EXTENSIBLE
is officially declared as out-of-support. I'm not so sure if the wav file @MemoonaTahira is trying to use is also WAVE_FORMAT_EXTENSIBLE
or not.
:cry:
Hi! Yes, I had some audio files in extensible format. This is easy to fix. The file is not in 16 bit PCM. Rewrite it to 16 bit PCM, i.e. resampling and regardless of chunks, if you use tfio.audio, it will work. You'll end up using Librosa :3
The reason is: tf.audio.decode_wav can read any PCM modulated file and resample it, but it can't handle out of order BEXT/JUNK chunks.
vs
tfio.audio.decode_wav needs this parameter dtype: tfio.audio.decode_wav(file_contents, dtype=tf.int16)
which assumes the file is in 16 bit PCM. Once the files are in 16 bit, tfio.audio wont give you the chunk issue.
Edit: You were right about tfio.audio working with out of order extra chunks, I tried that again later. I have edited my comment above to give a solution for tf.audio now. It was this exact error Bad format chunk size for WAV: Expected 16 or 18, but got40
which made me think tfio.audio isn't working either. Anyways, I tried keeping the code exclusively in TensorFlow by trying a way to resample it first without using the decode function, but it is a chicken and egg problem here, and I ended up using Librosa here too.
Personal opinion, resampling to 16 bit PCM before using the tf function to decode is faster for model training time. I haven't timed it but if you resample to 16 bit PCM beforehand and save the dataset/audio files once, you'll be cutting down on the time the files are processed during training when tf function is called. Prefetch helps though.