io
io copied to clipboard
Add ability to use `from_ffmpeg` as GraphIOTensor
the audio graph modes as introduced in #615 greatly improves audio loading in tf.data pipelines. When external (ffmpeg) decoding is required (in addition to #650), it would be great if GraphIOTensor support could be added to from_ffmpeg as well so that the following function could be used in a dataset.map(loadfromffmpeg):
@tf.function
def loadfromffmpeg(fp):
audio = tfio.IOTensor.graph(tf.int16).from_ffmpeg(fp)
return tf.cast(audio.to_tensor(), tf.float32) / 32767.0
I don't know what the best api would be to support lazyloading using get_item such as support for from_audio. Doing partial (chunked) loading using ffmpeg using start/duration in sample will not be possible since compressed formats such as mp3 do not support exact seeking.
I would therefore propose to not support indexing but rather just add start=second and duration=second as a parameter for now.
@yongtang did you check if this is feasible?
@faroit Yes it is still in the works. There are several items needed to sort out, 1) the representation of time and how it could be used in __getitem__. 2) Ideally ffmpeg processing suited for Linux, but with other platforms (Windows and macOS) it might be better to go with platform specific codec. I am looking into macOS at the moment.
Is there anything I could help with? I can think of concepts for 1)...
@faroit for time I am inclined to use int64 with milli/micro/nano scale. And treat audio as a time-series. May need to see the best suitable scale for audio.
Since there are several other places that uses time/time-series (pcap, prometheus, to name a few), we may want to consolidate into one time scale.
Also /cc @ivelin in case you are interested, as it might cover both audio and PCAP.
Yeah, I was hitting the same problem trying to speed up this tutorial by converting it to tfio.
But you can't use from_ffmpeg in a dataset.map because it's graph mode only
https://drive.google.com/file/d/1N_cH3R03D1vIbTFomAfZSu4ZhGdkwXhP/view?usp=sharing