audio issues

large resampling kernels slow ALSO on the forward pass

1

### 🐛 Describe the bug -- i understand i still have to respond to my PR on kernel creation speed (sorry about that!) - but I found another problem when...

xvdp

triaged

Initial condition support for torchaudio.functional.lfilter

5

### 🚀 The feature scipy.signal.lfilter supports an initial condition zi. This is critical for dealing with streaming inputs where we get one sample at a time. This feature is currently...

shchhan123456

Metadata mode for torchaudio.dataset

6

### 🚀 The feature Hello, Thanks for the handy tools for parsing the database! I am wondering if it is possible to let the torchaudio.dataset classes have two modes: 1....

leo19941227

RFC: The future of Kaldi compliance module

24

# Request For Comment: The future of Kaldi-compatible features ## Problems `torchaudio.compliance.kaldi` implements functionalities that tries to reproduce Kaldi's feature extractions, and this module has many issues, and causing headache...

mthrok

Kaldi

RFC

Wav2vec2 output is affected by zero-padding

11

### 🐛 Describe the bug I've found that the output of the wav2vec2 pipeline model is bugged, and changes depending on the zero-padding used in batch preprocessing, a simple example...

JackPfizer

bug

help wanted

module: models

The apply_codec function does not behave normally

2

### 🐛 Describe the bug I am trying to convert an audio tensor into 'gsm' format to simulate communication process with this apply_codec function. However, beside a transformed tensor, this...

changtaoli

Proposal for the integration of Tree-constrained Pointer Generator and Minimum Biasing Word Error (MBWE) training for contextual ASR

2

### 🚀 The feature I’d like to propose the integration of tree-constrained pointer generator (TCPGen) [1] and Minimum Biasing Word Error (MBWE) training [2] for contextual biasing into torchaudio package....

BriansIDP

Enable ROCm RNN-T Loss

3

Added HIPIFY code and small changes for ROCm. Targeting RNN-T loss.

jpvillam-amd

cla signed

module: rocm

Need more detail and tutorial on how to use the language model to decrease the word rate error.

7

### 📚 The doc issue 1. How do we build our own language model and add it to the language model, such as wav2vec2? However many of the solutions from...

AliceSum

[v0.12] torchaudio.info reports num_frames=0 for MP3

8

### 🐛 Describe the bug First, download a `wav` and a `mp3` file: ``` wget https://filesamples.com/samples/audio/wav/sample3.wav wget https://filesamples.com/samples/audio/mp3/sample3.mp3 ``` Here is a short repro: ```python import torchaudio # try reading...

iceychris

audio
audio copied to clipboard

Metadata

large resampling kernels slow ALSO on the forward pass

Initial condition support for torchaudio.functional.lfilter

Metadata mode for torchaudio.dataset

RFC: The future of Kaldi compliance module

Wav2vec2 output is affected by zero-padding

The apply_codec function does not behave normally

Proposal for the integration of Tree-constrained Pointer Generator and Minimum Biasing Word Error (MBWE) training for contextual ASR

Enable ROCm RNN-T Loss

Need more detail and tutorial on how to use the language model to decrease the word rate error.

[v0.12] torchaudio.info reports num_frames=0 for MP3

← Metadata

Owner

Metadata

audio audio copied to clipboard

Metadata

← Metadata

Owner

Metadata

audio
audio copied to clipboard