audio icon indicating copy to clipboard operation
audio copied to clipboard

Update on TorchAudio’s future

Open scotts opened this issue 7 months ago • 8 comments

Dear TorchAudio users,

TorchAudio is the most popular audio library for PyTorch. It has critical transforms, models and datasets that we know the community relies on. That is why we wanted to let the community know that we have started a refactoring effort to transition TorchAudio into a maintenance phase. This process will involve removal of some user-facing features. We have three goals we want to achieve with this effort:

  1. Make TorchAudio easier to maintain to ensure long-term reliability. We plan to eliminate all C++ code so that TorchAudio is a Python-only library. We also plan to reduce external dependencies as much as possible. Both efforts will simplify testing and release.
  2. Reduce redundancies with the rest of the PyTorch ecosystem. Some of the functionality in TorchAudio is also available in TorchVision and TorchCodec. We are working across all three libraries to ensure a given capability lives in one library.
  3. Focus on TorchAudio’s strengths. Those strengths are the audio transforms, models and datasets that are integral to users training and inference pipelines. As a result, we will deprecate and eventually remove some functionality that is outside of these strengths.

The diagram below depicts the various components of TorchAudio. We have highlighted it according to the user-facing API changes that we are making:

Image

Starting with TorchAudio 2.8 (expected around August 2025), APIs slated for removal will trigger a deprecation warning. These APIs will be fully removed in TorchAudio 2.9 (anticipated by the end of 2025).

Most of the APIs in transforms, functional, compliance.kaldi, models and pipelines modules will remain. These are the APIs that we identified as the most popular and valuable ones.

  • A few APIs, specifically those relying on C++ implementations like RNNT loss and forced-alignment, may be dropped. Some, like lfilter and overdrive, will switch to pure-Python implementations, which might affect performance. We are exploring options to retain C++-backed APIs, but this is unlikely.
  • Remaining APIs will be compatible with the latest stable PyTorch version. No new features will be added.

The decoding and encoding capabilities of TorchAudio for both audio and video data will migrate to TorchCodec, where we are consolidating all of PyTorch media decoding and encoding. TorchAudio’s decoding and encoding APIs will be deprecated from TorchAudio 2.8, and they will be removed in TorchAudio 2.9, so we encourage users to migrate to TorchCodec as soon as possible. TorchCodec already supports video and audio decoding, and encoding will be supported soon. While there isn't a direct 1:1 API mapping, the migration process should be smooth. Please report any issues in the TorchCodec repository.

All other modules and APIs will be removed in TorchAudio 2.9.

We understand that these changes may be disruptive. We believe that they are unfortunately necessary, in order for us to guarantee TorchAudio’s stability in the future.

scotts avatar Apr 24 '25 14:04 scotts

Hi @scotts, thanks for reporting the status of torchaudio and future plans.

I don't understand the decision to drop the C++/CUDA extensions... They are implemented because of the super inefficiency if they're done in pure Python (with JIT compilation). Just like you said, Torchaudio's strength is its various audio transforms. Thus, they should be kept instead of removed. Switching back to pure-Python implementations is like going backwards and makes no sense. These low-level implementations enable a state-of-the-art training speed compared to other libraries. (check out torchaudio 2.1 ASRU paper.) The lfilter has recently been used in torchfx as the low-level operator for differentiable and fast filtering on GPU. They're valuable to the community, and the decision to drop them is unwise, disruptive, and disastrous.

There should be more discussions on this before making the decision. I suggest holding back this decision.

Best wishes,

Chin-Yun PhD student

Centre for Digital Music School of Electronic Engineering and Computer Science Queen Mary University of London Email: [email protected]

yoyolicoris avatar Apr 24 '25 16:04 yoyolicoris

@scotts thanks for the update! Removing the C++/CUDA extensions is a big step backwards for the community and makes some of the implementations essentially useless due to their slow Python-only versions. I understand some concessions must be made if PyTorch Audio is no longer going to be actively developed, but I would also highly encourage reconsidering the removal of the C++ extensions, at least for the most popular operators. Thanks!

christhetree avatar Apr 25 '25 10:04 christhetree

About lfilter, it would be nice to match the scipy precision and behaviour. I understand in big pictures but a lot of work because of this.

bruAristimunha avatar Apr 25 '25 21:04 bruAristimunha

@yoyolicoris, @christhetree, thanks for taking the time to reply. I understand that removing C++ implementations may be a performance regression for those components. I would like to further explain the motivation for why removing this C++ code specifically improves the long-term health of TorchAudio:

  1. C++ compilation complicates testing. Because we need to use different C++ compilers in the cross product of all supported platforms (Linux, Windows and Mac), architectures (x86, arch64) there's much more chance of breakages. A Python only repo reduces the testing matrix down to just platform and Python version.
  2. C++ binaries complicates release. Each entry in the cross product of platforms, architectures, device and Python version requires a separate wheel. Because of this, we can see that the "TorchAudio 2.7 release" is actually 109 wheel files. A Python only repo reduces that down to the same number of wheels as supported devices, which I think would be just 4.
  3. The Torch C++ API is not ABI-stable, and all libraries that use the C++ API must release with each new version of PyTorch. This means that point 1 and 2 must be dealt with on the regular PyTorch release cadence which is roughly every 3 months.

In the update, we did say: "We are exploring options to retain C++-backed APIs, but this is unlikely." Specifically, that exploration is if we can take advantage of a new effort in PyTorch 2.7, which is a stable ABI. That only addresses point 3, but addressing point 3 could greatly reduce the cost of point 2. The cost of point 1 would still stand, though. For those interested in retaining various C++ components, let us know if you have the capacity to explore porting these components to the stable ABI. That changes the maintenance cost equation.

scotts avatar Apr 26 '25 03:04 scotts

  1. Make TorchAudio easier to maintain to ensure long-term reliability. We plan to eliminate all C++ code so that TorchAudio is a Python-only library. We also plan to reduce external dependencies as much as possible. Both efforts will simplify testing and release.

Maybe for some other C++ components, the model could be to factor them out in separate repo which doesn't provide binaries releases and supports only some GitHub Actions CI for testing and relies on users who must build it themselves

Also, for some C++ code, maybe load_inline(...) method can be used / improved: https://pytorch.org/docs/stable/cpp_extension.html#torch.utils.cpp_extension.load_inline for simplifying build scripts. Like so - the user would be responsible for having the working toolchain, and binaries would be built on the enduser's machine

Also, maybe a way forward would be to convert some C++ code to pure C API (e.g. could work for ffmpeg effects), to be called via ctypes (and use DLPack API or pure pointers for passing tensors for processing). This should eliminate the problem of unstable PyTorch C++ ABI.

Regarding ffmpeg effects, maybe they could also be moved to torchcodec, as working with ffmpeg filter chains would be a very useful feature...


Another useful component in torchaudio are bindings to flashlight, but flashlight itself is discontinued for several years now. So probably the best path there would be factoring out flashlight C++ code + python bindings in torchaudio in a new standalone repo like Nvidia did: https://github.com/nvidia-riva/riva-asrlib-decoder . This is already half-done into https://github.com/flashlight/text, but would be nice to maybe move the Python bindings https://pytorch.org/audio/0.12.0/models.decoder.html next to it? Also, given that Flashlight itself is discontinued, maybe worth moving the decoder out of the Flashlight org? to the pytorch org?

vadimkantorov avatar Apr 28 '25 18:04 vadimkantorov

Thank you for sharing this. I respect and love what you guys are doing, but you're treating Python like it's not Python. You already know that this means most of the library's APIs are going to be tens of times (if not hundreds of times) slower and more inefficient by all measures... Dropping C++ is not worth it here, it's not possible to match the performance with Python. To be fair, it's fast because it's not really Python code.

Thanks for all the efforts, I hope you refine your plans for TorchAudio at least to some extent.

parsasabetz avatar May 08 '25 15:05 parsasabetz

@scotts Also, might be interesting to promote some of stable signal processing functions / modules into PyTorch core (e.g. new torch.signal namespace akin to https://docs.scipy.org/doc/scipy/reference/signal.html)?

vadimkantorov avatar May 12 '25 08:05 vadimkantorov

Another solution might be:

  • moving all python-only models/code to HuggingFace
  • moving mature functions/transforms to core pytorch
  • moving all other C++ extensions to use torch.utils.cpp_extensions.load_inline or nvrtc (via https://github.com/NVIDIA/cuda-python)
  • maybe when possible - let go of using libtorch / torch::Tensor interface and replace it with DLPack interfacing or raw pointers, this would make user-side compilation very stable

vadimkantorov avatar May 16 '25 09:05 vadimkantorov

Hi all, here's a quick update, as we just published TorchAudio 2.8.

Deprecated APIs

Most APIs marked as "Drop" above are now explicitly deprecated, raising deprecation warnings in the docs, and when using them from Python. They will be removed in the next 2.9 version.

Migration of load() and save() to TorchCodec

As we mentioned, we are consolidating the decoding and encoding capabilities of PyTorch in TorchCodec.

torchaudio.load() and torchaudio.save() are some of the most popular TorchAudio APIs, so for convenience we are providing torchaudio.load_with_torchcodec() and torchaudio.save_with_torchcodec(), which can largely be used as drop-in replacements. However, we do encourage users to directly migrate to TorchCodec's AudioDecoder() and AudioEncoder().

In future versions, torchaudio.load() and torchaudio.save() will still exist, but their underlying implementation will be relying on torchaudio.load_with_torchcodec() and torchaudio.save_with_torchcodec().

We hope for this migration to be as smooth as possible - most users should just need to pip install torchcodec, and things should still work as-is.

TorchCodec doesn't support Windows yet, but we're working hard on it. Please bear with us.

C++ and CUDA extension

We mentioned that we were exploring options to retain the C++-backed APIs, which are currently slated for deletion. Specifically: forced_align, lfilter, overdrive, RNNT, and CUCTC.

While this isn't something I can assert with 100% certainty, we are now more confident that we'll be able to preserve these extensions by porting them to Pytorch's new "stable ABI" operators. We are actively working on it.

Nicolas

NicolasHug avatar Aug 06 '25 16:08 NicolasHug

Would it be another alternative to somehow convince core to take in all the C++/CUDA ops from torchaudio? (like some CTC impl is already in core, and lfilter might be the basis for new torch.signal namespace drawing from scipy.signal) :)

This would radically simplify build process of torchaudio and can make it Python-only

Given that the development of torchaudio is not increasing, could it be a good way forward?

vadimkantorov avatar Aug 06 '25 20:08 vadimkantorov

I guys i am very glad to say what you have a great funy product, simply full stack with an interface with all the hugdeface model to do great apps even for personnal need, and for that i very please to say you do a great job !!! But now the question is reability and resiliance of the product could please at least doing a realese report with which up a runing and which is in dev !!!! I spent 3 months of test with good test play with traduction and emotion implement and now nothich works !!! Even with openAI and the help of the reference action and code. So i a want to make voice translation with a refence wav as I do 1 moth ago that il the version of stable module python ?

Best regards

danielRodriguez1971 avatar Aug 07 '25 11:08 danielRodriguez1971

core to take in all the C++/CUDA ops from torchaudio?

This is something we've considered. It's true that it would simplify torchaudio's side, but it would offload some the cost and debt onto core, so it's not a simple decision to make. At the moment, we think that porting the ops to the stable ABI is our best bet.

NicolasHug avatar Aug 12 '25 07:08 NicolasHug

Here from the deprecation warning.

Would appreciate an alternative that allows retaining some of these ops going forward (I'm developing an alignment library that is using forced_align).

What does it imply in practical terms that these will be ported to Pytorch's "stable ABI"? Will users be able to keep using them via Pytorch? Sorry for asking what may be an obvious question, but I'm just not knowledgeable about what it means for something to be ported to ABI.

Lauler avatar Aug 14 '25 12:08 Lauler

This platform is a buggy disorganized mess. Why would you be deprecating functionality that doesn't yet exist elsewhere (e.g., Windows support for TorchCodec) or is now much slower (e.g., removing CUDA extensions without a suitable replacement)?

xemma-official avatar Sep 03 '25 14:09 xemma-official

When my program runs: print(f"{torchaudio.list_audio_backends()}") [] The response is now: UserWarning: torchaudio._backend.list_audio_backends has been deprecated. This deprecation is part of a large refactoring effort to transition TorchAudio into a maintenance phase. The decoding and encoding capabilities of PyTorch for both audio and video are being consolidated into TorchCodec. Please see https://github.com/pytorch/audio/issues/3902 for more information. It will be removed from the 2.9 release. So what code should I use instead if the sound file backend is not recognized?

wilsonmar avatar Sep 07 '25 00:09 wilsonmar

import warnings

warnings.filterwarnings("ignore", message=r".*(maintenance phase|TorchCodec).*")  # RIP torchaudio

jnordberg avatar Sep 14 '25 09:09 jnordberg

This will break legacy projects that rely on torch>2 since the expectation is non-breaking changes without going to torch 3. It will also create a need to detect APIs or support either pre-or-post refactor pytorch. Meaning that one project which could previously be used with 2.0 to 2.7 will now have to choose between 2.8 or <2.8 support. Therefore limiting the ability for let's say, Matcha TTS to be reused in new models that want to use Torch 2.8. Meanwhile, new CUDA and new GPU support is likely to push new versions of pytorch to be required.

rsxdalv avatar Sep 14 '25 18:09 rsxdalv

What is the viability of building old TorchAudio versions, such as 2.7.0 with new Pytorch, such as 2.8.0? It might involve setting the torchaudio to a fake version, i.e. 2.8.0 (while actually having 2.7.0's code) for ease of installation.

rsxdalv avatar Sep 14 '25 19:09 rsxdalv

This will break legacy projects that rely on torch>2 since the expectation is non-breaking changes without going to torch 3. It will also create a need to detect APIs or support either pre-or-post refactor pytorch. Meaning that one project which could previously be used with 2.0 to 2.7 will now have to choose between 2.8 or <2.8 support. Therefore limiting the ability for let's say, Matcha TTS to be reused in new models that want to use Torch 2.8. Meanwhile, new CUDA and new GPU support is likely to push new versions of pytorch to be required.

TBH I doubt they care. A few years ago they randomly dropped complex32 support for STFT... Over years APIs have been changing a bit erraticaly. I believe torchaudio team is understaffed.

Just compare installing jax where you have jax for cpu and jax[cuda_XX] for cuda versions. Pytorch pretty much needs a shell script to install the correct version. In windows it installs pytorch-cpu by default, but in linux it install pytorch with cuda by default. It's a mess.

JuanFMontesinos avatar Sep 17 '25 10:09 JuanFMontesinos

Are you, at least, going to propose alternatives to the deprecated APIs?

For example, forced alignment? https://docs.pytorch.org/audio/stable/generated/torchaudio.functional.forced_align.html#torchaudio.functional.forced_align

empz avatar Sep 23 '25 08:09 empz

@empz the current status is still https://github.com/pytorch/audio/issues/3902#issuecomment-3160818888. In all likelihood, we'll be able to preserve forced_align and the other C++ / CUDA operators of torchaudio.

NicolasHug avatar Sep 23 '25 09:09 NicolasHug

@empz the current status is still https://github.com/pytorch/audio/issues/3902#issuecomment-3160818888. In all likelihood, we'll be able to preserve forced_align and the other C++ / CUDA operators of torchaudio.

Oh really? That's great to read.

The doc page still says it's deprecated and going to be removed in 2.9 though.

empz avatar Sep 23 '25 10:09 empz

Yes, and it will still say something along those lines the 2.9 version that we'll publish in the next few weeks. We'd rather be overly-pessimistic and raise a warning about a deprecation that may eventually not happen, rather than delete something without a warning.

NicolasHug avatar Sep 23 '25 11:09 NicolasHug

Migration of load() and save() to TorchCodec

As we mentioned, we are consolidating the decoding and encoding capabilities of PyTorch in TorchCodec.

torchaudio.load() and torchaudio.save() are some of the most popular TorchAudio APIs, so for convenience we are providing torchaudio.load_with_torchcodec() and torchaudio.save_with_torchcodec(), which can largely be used as drop-in replacements. However, we do encourage users to directly migrate to TorchCodec's AudioDecoder() and AudioEncoder().

In future versions, torchaudio.load() and torchaudio.save() will still exist, but their underlying implementation will be relying on torchaudio.load_with_torchcodec() and torchaudio.save_with_torchcodec().

We hope for this migration to be as smooth as possible - most users should just need to pip install torchcodec, and things should still work as-is.

This is a bold assumption. You just can't make more than half of the arguments of load() and save() unsupported and claim that everything will just work for most™ users.

Because (surprise) they don't, and I already had to waste too much time on this.

Link87 avatar Oct 02 '25 11:10 Link87

I'm sorry that you are experiencing migration issues @Link87 . Please feel free to open an issue on TorchCodec, we're interested in knowing more about which features are useful.

NicolasHug avatar Oct 02 '25 12:10 NicolasHug

I just happened to find this as I was on my way over here to request a feature for torchaudio.functional.biquad, which (TIL) uses lfilter.

Just wanted to add a "+1" to the chorus of folks requesting to keep the C++-optimized filtering implementations! My perspective working on neural-amp-modeler is that torchaudio is only valuable because of these optimized routines; nothing that can be done in pure Python is particularly valuable to take a dependence for in PyTorch--those are the easy parts.

(And a big "thank you!" for doing that hard work with this library!)

sdatkinson avatar Oct 08 '25 22:10 sdatkinson

Thank you for your thoughtful feedback @sdatkinson . This is a bit lost in the discussion, so I'll re-iterate that we're actively working on preserving the C++ optimized operators of torchaudio https://github.com/pytorch/audio/issues/3902#issuecomment-3323073676. They will still show warnings along the lines of "those will be removed" in the next version, but hopefully this won't be the case.

NicolasHug avatar Oct 09 '25 10:10 NicolasHug

I think it's time for the dev community to fork this repo and use a way the majority want. always in this kind of situation soon or late the decisions are made without to ask anything about developers around. I thought democracy is find a consensus of each decision between the root and the majority. if pytorch is famous today it's because of the root's developer but also the thousands of developers / users using it. without them, it will be an abandoned project very quickly. And I'm certainly not against anyone. I just say what it is.

ROBERT-MCDOWELL avatar Oct 17 '25 14:10 ROBERT-MCDOWELL

To add another positive voice to the discussion: thanks for your work ❤️

It would be fantastic if the C++ implementations could work using the new stable ABI 🙏

foxik avatar Oct 17 '25 15:10 foxik

❤️ I also very much appreciate all that the TorchAudio maintainers have contributed to the open-source world, and thank you for this thoughtful approach to communicating an unfortunate but necessary change. 🙏

arlofaria-cartesia avatar Oct 17 '25 19:10 arlofaria-cartesia