audio Please, add `load_audio_file_from

🚀 The feature

It is only for C++. In torchaudio/csrc/sox/io.h we have method load_audio_file(...) that reads the contents of a file from disc to torch::Tensor. But we have no API method to load torch::Tensor from the in memory bytes.

Motivation, pitch

For the instance. In C++ TCP server app, I have got some audio file content after socket read. So, my audio data are stored in array of bytes in RAM. Now I want to process this audio with torch. I need to read this bytes array to torch::Tensor. But torchaudio C++ API have no direct method to do this. Currently I need to save bytes array to disc and then use load_audio_file(...). It is evident, that it is super inefficient way (write to disk + read from disk).

Alternatives

No response

Additional context

No response

Jul 20 '22 13:07 pi-null-mezon

Hi @pi-null-mezon

Thanks for filing the feature request. There are two non-technical concerns around C++ API.

We do not have policy around C++ API. Currently, C++ code are not public API. We are not committed to keep backward compatibility.
There is currently no frameworks for testing (and ensuring) C++ API The surface of C++ API are pretty much same as the corresponding Python API, but we make no promise about it. This makes it awkward to create an API tailored for using from C++. I imagine that, at core, an API for in-memory decoding would take something like void* data_pointer and size_t data_size. We do not have fixtures to guarantee the API and we do not have the guidance to maintain the API. (i.e. when and how to change the API? what is the justification of BC-breaking change?)

Having said that I think one solution with the current codebase (main or 0.12 release) without requiring a new API from us is to wrap the socket connection into a class that mimics the behavior of file-like object and create AVIOContext and pass it to StreamReader. That way, your application can interactively fetch data, which is often better than fetching all the data and keeping it in memory.

What do you think?

Jul 25 '22 23:07 mthrok

Hi @mthrok

I got it, thanks
Maybe its awkward, for the reasons you have mentioned. But in my perspective, for the sources host it should be pretty simple to add asked function. And community will be pretty satisfied with void* data_pointer and size_t data_size.

Can I find somewhere examples with AVIOContext and StreamReader?

Jul 26 '22 06:07 pi-null-mezon

@pi-null-mezon

Maybe its awkward, for the reasons you have mentioned. But in my perspective, for the sources host it should be pretty simple to add asked function. And community will be pretty satisfied with void* data_pointer and size_t data_size.

Sure it is easy to add, but the lack of policy about public C++ API makes it difficult to maintain. Because this new function is not connected to any public API on Python surface, we do not have a mean to maintain it. A new developer took the maintenance work, and see that this API is not used anywhere in the library, not documented, and torchaudio does not in general have official C++ API. Logically, it is fine to delete the API anytime.

In a simple term, since we have not made a promise to commit the maintenance of C++ API, we cannot add it.

Can I find somewhere examples with AVIOContext and StreamReader?

The code for handling file-like object for StreamReader is one example, though not a great one. It is found in csrc/ffmpeg/pybind directory.

The following is the gist;

write read function that wrap the data source, which can be passed to avio_alloc_context function. The following is the example from above; https://github.com/pytorch/audio/blob/c26b38b29b3f3f972f50057df20d0a226dc062a4/torchaudio/csrc/ffmpeg/pybind/typedefs.cpp#L7-L32 The file-like object passed from Python is stored at opaque pointer, the read function retrieves it and call the equivalent method on the Python object, then pass the data to the given destination memory location. In your case, wrap either the socket connection object or the data you already read from the socket, and implement read around it.
[optional] write seek function Similar to read function above. I am not sure if your socket object can perform seek operation. If so, providing seek function increases the coverage of format. If the socket does not support, you can implement read and seek on the media data saved on memory. https://github.com/pytorch/audio/blob/c26b38b29b3f3f972f50057df20d0a226dc062a4/torchaudio/csrc/ffmpeg/pybind/typedefs.cpp#L34-L41 Note: If writing this function over data saved on memory, returning the total data size in response to AVSEEK_SIZE will improve the performance. (In the above implementation, we do not know, so it's not used.)
create AVIOContext object using the opaque pointer, read (and optional seek) function. https://github.com/pytorch/audio/blob/c26b38b29b3f3f972f50057df20d0a226dc062a4/torchaudio/csrc/ffmpeg/pybind/typedefs.cpp#L43-L64
Instantiate StreamReader object. https://github.com/pytorch/audio/blob/c26b38b29b3f3f972f50057df20d0a226dc062a4/torchaudio/csrc/ffmpeg/pybind/stream_reader.cpp#L25-L35 Once the AVIOContext is created, pass it to the constructor of StreamReader, then the StreamReader will fetch data from the read function you implemented. Note that in above example, due to the limitation of PyBind/TorchBind, StreamReaderBinding class is used, but since you are using it from C++, you can directly use StreamReader class. Note that the underlying opaque object (your socket connection) has to be alive and valid throughout the lifetime of StreamReader instance. The usage of StreamReader is pretty much the same as the equivalent Python version.

Jul 29 '22 22:07 mthrok

Thank you. I'll try.

Aug 01 '22 07:08 pi-null-mezon

Adding Tensor-based in-memory support to StreamReader in #2694

Sep 19 '22 16:09 mthrok

FYI: We will be adding C++ API for this through StreamReader class.

Jan 20 '23 06:01 mthrok

Please refer to torchaudio::io::StreamReaderCustomIO for this use case.

Jul 31 '23 16:07 mthrok

Hi @mthrok, thank you for adding this feature. How do I convert in-memory data to a torch tensor using torchaudio::io::StreamReaderCustomIO? I have audio data in the form of std::vector<std::vector<float>> and want to convert it into a tensor. I want to have a tensor from in-memory data similar to the return value of torchaudio::sox::load_audio_file.

Sep 22 '23 00:09 divyansh2681

@divyansh2681 Converting numerical vectors to tensor type is not what this API does. There are bunch of websites and forums explain how to do it like this one. https://stackoverflow.com/questions/63466847/how-is-it-possible-to-convert-a-stdvectorstdvectordouble-to-a-torchten Please refer to them.

Sep 22 '23 13:09 mthrok

@divyansh2681 Converting numerical vectors to tensor type is not what this API does. There are bunch of websites and forums explain how to do it like this one. https://stackoverflow.com/questions/63466847/how-is-it-possible-to-convert-a-stdvectorstdvectordouble-to-a-torchten Please refer to them.

Ah I see, thank you for your help!

Sep 25 '23 18:09 divyansh2681

Please, add `load_audio_file_from_memory`

🚀 The feature

Motivation, pitch

Alternatives

Additional context