io icon indicating copy to clipboard operation
io copied to clipboard

Add ability to use `from_ffmpeg` as GraphIOTensor

Open faroit opened this issue 6 years ago • 5 comments

the audio graph modes as introduced in #615 greatly improves audio loading in tf.data pipelines. When external (ffmpeg) decoding is required (in addition to #650), it would be great if GraphIOTensor support could be added to from_ffmpeg as well so that the following function could be used in a dataset.map(loadfromffmpeg):

@tf.function
def loadfromffmpeg(fp):
    audio = tfio.IOTensor.graph(tf.int16).from_ffmpeg(fp)
    return tf.cast(audio.to_tensor(), tf.float32) / 32767.0

I don't know what the best api would be to support lazyloading using get_item such as support for from_audio. Doing partial (chunked) loading using ffmpeg using start/duration in sample will not be possible since compressed formats such as mp3 do not support exact seeking.

I would therefore propose to not support indexing but rather just add start=second and duration=second as a parameter for now.

faroit avatar Nov 20 '19 09:11 faroit

@yongtang did you check if this is feasible?

faroit avatar Dec 06 '19 16:12 faroit

@faroit Yes it is still in the works. There are several items needed to sort out, 1) the representation of time and how it could be used in __getitem__. 2) Ideally ffmpeg processing suited for Linux, but with other platforms (Windows and macOS) it might be better to go with platform specific codec. I am looking into macOS at the moment.

yongtang avatar Dec 06 '19 16:12 yongtang

Is there anything I could help with? I can think of concepts for 1)...

faroit avatar Dec 06 '19 19:12 faroit

@faroit for time I am inclined to use int64 with milli/micro/nano scale. And treat audio as a time-series. May need to see the best suitable scale for audio.

Since there are several other places that uses time/time-series (pcap, prometheus, to name a few), we may want to consolidate into one time scale.

Also /cc @ivelin in case you are interested, as it might cover both audio and PCAP.

yongtang avatar Dec 06 '19 20:12 yongtang

Yeah, I was hitting the same problem trying to speed up this tutorial by converting it to tfio.

But you can't use from_ffmpeg in a dataset.map because it's graph mode only

https://drive.google.com/file/d/1N_cH3R03D1vIbTFomAfZSu4ZhGdkwXhP/view?usp=sharing

MarkDaoust avatar Jun 30 '23 22:06 MarkDaoust