audio
audio copied to clipboard
Data manipulation and transformation for audio signal processing, powered by PyTorch
### 🚀 The feature In some research cases, the Wav2Vec2 or HuBERT is expected to be frozen (i.e. make ``reuqires_grad=False`` for all params). - Users use it as a feature...
https://github.com/pytorch/audio/pull/2025
### 🚀 The feature Recently torchaudio supported mask-based MVDR beamforming module, which takes the multi-channel noisy STFT and the estimated Time-Frequency masks as the input, and generates the single-channel enhanced...
This is a reminder to update speech recognition tutorial before the release of v0.11. We have added model surgery to pre-trained wav2vec2 model so as to remove unused dimensions at...
### 🚀 The feature In ``torchaudio.transforms.MVDR`` the trace of the multi-dimensional tensor is computed via a ``_get_mat_trace`` method due to the lack of PyTorch support. There is an [ongoing PR](https://github.com/pytorch/pytorch/pull/62714)...
In case of vocoding one example, by folding the input example into batch of chunks, the inference can run faster. https://github.com/pytorch/audio/blob/31dbb7540c78fe5d176948764cf9a20f55ac80dc/examples/pipeline_wavernn/wavernn_inference_wrapper.py#L167-L177 I excluded it from the initial tacotron2 pipeline, due...
HiFi-GAN is a popular/efficient TTS model. https://arxiv.org/abs/2010.05646
It would be interesting to add Torch-native CTC segmentation. ref - https://github.com/lumaku/ctc-segmentation - https://arxiv.org/abs/2007.09127
### 🐛 Describe the bug Hi I am trying to run the interactive ASR demo given [here](https://github.com/pytorch/audio/tree/main/examples/interactive_asr). However I am getting the following error ```text Traceback (most recent call last):...