audio icon indicating copy to clipboard operation
audio copied to clipboard

Add high-level IO function for image and video

Open mthrok opened this issue 2 years ago • 1 comments

Leveraging StreamReader we can load video and images. We should add functions

torchaudio.io.load_image torchaudio.io.load_video torchaudio.io.load_audio

which are thin wrapper around StreamReader.

(and perhaps save versions)

mthrok avatar May 10 '23 14:05 mthrok

Also, sometimes it's useful UX to have parallelized batched versions of these functions. Some related issues by me in torchvision:

  • https://github.com/pytorch/vision/issues/4988
  • https://github.com/pytorch/vision/issues/5461

There might be an issue that the return type (Tensor or TensorList/NestedTensor) could depend on the argument type (path or a list of paths). If that's something you'd like to avoid, one could force for the batched version to always require some out= argument passed (which could be a nice feature given that the output could be memory pinned to CUDA). Another hack could be to have a separate set functions working on TensorList (IMO less nice way, more nice would be to have some good dispatch)

vadimkantorov avatar May 27 '23 17:05 vadimkantorov