io icon indicating copy to clipboard operation
io copied to clipboard

tfio.audio.AudioIOTensor VS tf.audio.decode_wav

Open aliencaocao opened this issue 4 years ago • 5 comments

Anyone know whats the difference? I am getting a significant decrease in a audio classification model performance ( from 90% accuracy to 9%) the moment I switch from tf.audio.decode_wav to tfio.audio.AudioIOTensor without changing any other single line of code. I know that tf.audio.decode_wav will normalise the output between -1 and 1, and I managed to mimic that operation by doing tfio.audio.AudioIOTensor(file).to_tensor() / 32768.0, and I used all(tf.equal()) to verify that both produced exactly same tensor of exact same data format (tf.float32), yet I'm getting such a disparity in the model performance. I tried various models all same result.

Here is just a short snippet of how I used both ways. If I made any mistake in using them, please let me know.

TFIO way following the official doc: audio = tfio.audio.AudioIOTensor('3153.wav', dtype=tf.int16) audio_tensor = tf.cast(tf.squeeze(audio.to_tensor()), tf.float32) / 32768.0

TF way also following the official doc: audio, _ = tf.audio.decode_wav(tf.io.read_file('3153.wav')) waveform = tf.squeeze(audio, axis=-1)

And I used all(tf.equal(audio_tensor, waveform)) and this returned True

You can try with any wav file you may have.

aliencaocao avatar Jun 11 '21 15:06 aliencaocao

I'm also having issues with tfio.audio.AudioIOTensor that I don't see when using tf.audio.decode_wav. However, I managed to make tfio.audio.AudioIOTensor work when disabling parallel processing, so it may be a problem only observed (sometimes) when doing the loading in a parallel manner. @aliencaocao Are you using parallel loading of the audio in your input pipeline? If so, try to switch off parallel loading to see if it helps.

retunelars avatar Aug 19 '21 15:08 retunelars

I'm actually not sure if I am using parallel loading or not. I did not explicitly set anything related to parallel loading in the code, so unless it defaults to parallel loading, else it should not be using. Anyways, I just workaround the problem by simply using back the tf.audio decode. I guess tfio is not so stable yet.

aliencaocao avatar Aug 19 '21 19:08 aliencaocao

Something I experimentally verified and caught me completely off guard was that whenever you create two AudioIOTensors for two different audio files, the first instance will actually start reading from the file pointed to by the last one. I still have to check the documentation to find if this "singleton like" behavior for AudioIOTensor is by design or not, but from the API this is not what I would expect (i.e. you have the option to create many AudioIOTensor objects).

Not sure if you are processing many audio files at the same time or not, but thought I'd leave this info here anyway.

bflaton avatar Oct 05 '22 09:10 bflaton

AudioIOTensor does not seem to be thread safe. I just encountered that when creating a dataset that used a parallel map.

provos avatar Jan 29 '23 21:01 provos

Any update on this?

Mddct avatar Sep 12 '23 02:09 Mddct