audio icon indicating copy to clipboard operation
audio copied to clipboard

Data manipulation and transformation for audio signal processing, powered by PyTorch

Results 324 audio issues
Sort by recently updated
recently updated
newest added

### 🚀 The feature In some research cases, the Wav2Vec2 or HuBERT is expected to be frozen (i.e. make ``reuqires_grad=False`` for all params). - Users use it as a feature...

improvement
module: pipelines

### 🚀 The feature Recently torchaudio supported mask-based MVDR beamforming module, which takes the multi-channel noisy STFT and the estimated Time-Frequency masks as the input, and generates the single-channel enhanced...

This is a reminder to update speech recognition tutorial before the release of v0.11. We have added model surgery to pre-trained wav2vec2 model so as to remove unused dimensions at...

### 🚀 The feature In ``torchaudio.transforms.MVDR`` the trace of the multi-dimensional tensor is computed via a ``_get_mat_trace`` method due to the lack of PyTorch support. There is an [ongoing PR](https://github.com/pytorch/pytorch/pull/62714)...

enhancement

In case of vocoding one example, by folding the input example into batch of chunks, the inference can run faster. https://github.com/pytorch/audio/blob/31dbb7540c78fe5d176948764cf9a20f55ac80dc/examples/pipeline_wavernn/wavernn_inference_wrapper.py#L167-L177 I excluded it from the initial tacotron2 pipeline, due...

HiFi-GAN is a popular/efficient TTS model. https://arxiv.org/abs/2010.05646

It would be interesting to add Torch-native CTC segmentation. ref - https://github.com/lumaku/ctc-segmentation - https://arxiv.org/abs/2007.09127

### 🐛 Describe the bug Hi I am trying to run the interactive ASR demo given [here](https://github.com/pytorch/audio/tree/main/examples/interactive_asr). However I am getting the following error ```text Traceback (most recent call last):...

wontfix