Please, add `load_audio_file_from_memory`
🚀 The feature
It is only for C++. In torchaudio/csrc/sox/io.h we have method load_audio_file(...) that reads the contents of a file from disc to torch::Tensor. But we have no API method to load torch::Tensor from the in memory bytes.
Motivation, pitch
For the instance. In C++ TCP server app, I have got some audio file content after socket read. So, my audio data are stored in array of bytes in RAM. Now I want to process this audio with torch. I need to read this bytes array to torch::Tensor. But torchaudio C++ API have no direct method to do this. Currently I need to save bytes array to disc and then use load_audio_file(...). It is evident, that it is super inefficient way (write to disk + read from disk).
Alternatives
No response
Additional context
No response
Hi @pi-null-mezon
Thanks for filing the feature request. There are two non-technical concerns around C++ API.
- We do not have policy around C++ API. Currently, C++ code are not public API. We are not committed to keep backward compatibility.
- There is currently no frameworks for testing (and ensuring) C++ API
The surface of C++ API are pretty much same as the corresponding Python API, but we make no promise about it. This makes it awkward to create an API tailored for using from C++. I imagine that, at core, an API for in-memory decoding would take something like
void* data_pointerandsize_t data_size. We do not have fixtures to guarantee the API and we do not have the guidance to maintain the API. (i.e. when and how to change the API? what is the justification of BC-breaking change?)
Having said that I think one solution with the current codebase (main or 0.12 release) without requiring a new API from us is to wrap the socket connection into a class that mimics the behavior of file-like object and create AVIOContext and pass it to StreamReader. That way, your application can interactively fetch data, which is often better than fetching all the data and keeping it in memory.
What do you think?
Hi @mthrok
- I got it, thanks
- Maybe its awkward, for the reasons you have mentioned. But in my perspective, for the sources host it should be pretty simple to add asked function. And community will be pretty satisfied with
void* data_pointerandsize_t data_size.
Can I find somewhere examples with AVIOContext and StreamReader?
@pi-null-mezon
- Maybe its awkward, for the reasons you have mentioned. But in my perspective, for the sources host it should be pretty simple to add asked function. And community will be pretty satisfied with
void* data_pointerandsize_t data_size.
Sure it is easy to add, but the lack of policy about public C++ API makes it difficult to maintain. Because this new function is not connected to any public API on Python surface, we do not have a mean to maintain it. A new developer took the maintenance work, and see that this API is not used anywhere in the library, not documented, and torchaudio does not in general have official C++ API. Logically, it is fine to delete the API anytime.
In a simple term, since we have not made a promise to commit the maintenance of C++ API, we cannot add it.
Can I find somewhere examples with
AVIOContextandStreamReader?
The code for handling file-like object for StreamReader is one example, though not a great one. It is found in csrc/ffmpeg/pybind directory.
The following is the gist;
- write read function that wrap the data source, which can be passed to avio_alloc_context function. The following is the example from above; https://github.com/pytorch/audio/blob/c26b38b29b3f3f972f50057df20d0a226dc062a4/torchaudio/csrc/ffmpeg/pybind/typedefs.cpp#L7-L32 The file-like object passed from Python is stored at opaque pointer, the read function retrieves it and call the equivalent method on the Python object, then pass the data to the given destination memory location. In your case, wrap either the socket connection object or the data you already read from the socket, and implement read around it.
- [optional] write seek function
Similar to read function above. I am not sure if your socket object can perform seek operation. If so, providing seek function increases the coverage of format. If the socket does not support, you can implement read and seek on the media data saved on memory.
https://github.com/pytorch/audio/blob/c26b38b29b3f3f972f50057df20d0a226dc062a4/torchaudio/csrc/ffmpeg/pybind/typedefs.cpp#L34-L41
Note: If writing this function over data saved on memory, returning the total data size in response to
AVSEEK_SIZEwill improve the performance. (In the above implementation, we do not know, so it's not used.) - create AVIOContext object using the opaque pointer, read (and optional seek) function. https://github.com/pytorch/audio/blob/c26b38b29b3f3f972f50057df20d0a226dc062a4/torchaudio/csrc/ffmpeg/pybind/typedefs.cpp#L43-L64
- Instantiate
StreamReaderobject. https://github.com/pytorch/audio/blob/c26b38b29b3f3f972f50057df20d0a226dc062a4/torchaudio/csrc/ffmpeg/pybind/stream_reader.cpp#L25-L35 Once the AVIOContext is created, pass it to the constructor ofStreamReader, then theStreamReaderwill fetch data from the read function you implemented. Note that in above example, due to the limitation of PyBind/TorchBind,StreamReaderBindingclass is used, but since you are using it from C++, you can directly useStreamReaderclass. Note that the underlying opaque object (your socket connection) has to be alive and valid throughout the lifetime ofStreamReaderinstance. The usage ofStreamReaderis pretty much the same as the equivalent Python version.
Thank you. I'll try.
Adding Tensor-based in-memory support to StreamReader in #2694
FYI: We will be adding C++ API for this through StreamReader class.
Please refer to torchaudio::io::StreamReaderCustomIO for this use case.
Hi @mthrok, thank you for adding this feature. How do I convert in-memory data to a torch tensor using torchaudio::io::StreamReaderCustomIO? I have audio data in the form of std::vector<std::vector<float>> and want to convert it into a tensor. I want to have a tensor from in-memory data similar to the return value of torchaudio::sox::load_audio_file.
@divyansh2681 Converting numerical vectors to tensor type is not what this API does. There are bunch of websites and forums explain how to do it like this one. https://stackoverflow.com/questions/63466847/how-is-it-possible-to-convert-a-stdvectorstdvectordouble-to-a-torchten Please refer to them.
@divyansh2681 Converting numerical vectors to tensor type is not what this API does. There are bunch of websites and forums explain how to do it like this one. https://stackoverflow.com/questions/63466847/how-is-it-possible-to-convert-a-stdvectorstdvectordouble-to-a-torchten Please refer to them.
Ah I see, thank you for your help!