tfio.audio.AudioIOTensor VS tf.audio.decode_wav
Anyone know whats the difference? I am getting a significant decrease in a audio classification model performance ( from 90% accuracy to 9%) the moment I switch from tf.audio.decode_wav to tfio.audio.AudioIOTensor without changing any other single line of code. I know that tf.audio.decode_wav will normalise the output between -1 and 1, and I managed to mimic that operation by doing tfio.audio.AudioIOTensor(file).to_tensor() / 32768.0, and I used all(tf.equal()) to verify that both produced exactly same tensor of exact same data format (tf.float32), yet I'm getting such a disparity in the model performance. I tried various models all same result.
Here is just a short snippet of how I used both ways. If I made any mistake in using them, please let me know.
TFIO way following the official doc:
audio = tfio.audio.AudioIOTensor('3153.wav', dtype=tf.int16) audio_tensor = tf.cast(tf.squeeze(audio.to_tensor()), tf.float32) / 32768.0
TF way also following the official doc:
audio, _ = tf.audio.decode_wav(tf.io.read_file('3153.wav')) waveform = tf.squeeze(audio, axis=-1)
And I used all(tf.equal(audio_tensor, waveform)) and this returned True
You can try with any wav file you may have.
I'm also having issues with tfio.audio.AudioIOTensor that I don't see when using tf.audio.decode_wav. However, I managed to make tfio.audio.AudioIOTensor work when disabling parallel processing, so it may be a problem only observed (sometimes) when doing the loading in a parallel manner. @aliencaocao Are you using parallel loading of the audio in your input pipeline? If so, try to switch off parallel loading to see if it helps.
I'm actually not sure if I am using parallel loading or not. I did not explicitly set anything related to parallel loading in the code, so unless it defaults to parallel loading, else it should not be using. Anyways, I just workaround the problem by simply using back the tf.audio decode. I guess tfio is not so stable yet.
Something I experimentally verified and caught me completely off guard was that whenever you create two AudioIOTensors for two different audio files, the first instance will actually start reading from the file pointed to by the last one. I still have to check the documentation to find if this "singleton like" behavior for AudioIOTensor is by design or not, but from the API this is not what I would expect (i.e. you have the option to create many AudioIOTensor objects).
Not sure if you are processing many audio files at the same time or not, but thought I'd leave this info here anyway.
AudioIOTensor does not seem to be thread safe. I just encountered that when creating a dataset that used a parallel map.
Any update on this?