audio issues

Update stable symlink to 2.3.0.

1

This is in preparation for release 2.3.0.

ahmadsharif1

CLA Signed

Fix encoding for commonvoice.py

1

On windows, this defaults to cp1252, an incorrect encoding for this file.

jacobjennings

CLA Signed

Can not load commonvoice dataset on windows

1

### 🐛 Describe the bug When loading the common voice dataset on windows, the file `train.tsv` is loaded using cp1252 file encoding, leading to a failure. ``` training_speech_dataset = torchaudio.datasets.COMMONVOICE(root=base_dataset_cache_directory)...

jacobjennings

Support for 10bit / 12bit encoding (e.g. yuv420p10le) in StreamWriter

### 🚀 The feature The ability to provide 16 bit data (`torch.int16`) as input to `StreamWriter` with the understanding that the data will be truncated to 10/12 bit depending on...

tvercaut

Cannot load audio from pathlib.Path

1

### 🐛 Describe the bug Running the following: ```python import torchaudio from pathlib import Path test_audio_path = Path('test.wav') torchaudio.load(test_audio_path) ``` Produces the following error: ``` Traceback (most recent call last):...

roedoejet

Using MMS model with `star` token for batch size > 1

1

The current implementation assumes batch size is one, when attaching the `star` dimension: https://github.com/pytorch/audio/blob/ea437b31ce316ea3d66fe73768c0dcb94edb79ad/src/torchaudio/pipelines/_wav2vec2/utils.py#L41 However, the underlying Wav2vec model supports batch size greater than one. So this line should instead...

huangruizhe

Do not use channel_layout in StreamReader

1

We only care about the number of channels, so no need to create channel_layout. One can directly pass the number of channels to filter. Also int64 channel_layout is a deprecated...

mthrok

CLA Signed

StreamReader seek method seeks to wrong frame for opus format

2

### 🐛 Describe the bug Streamreader `seek` not seeking to correct frame even with `mode='precise'`. Use below code to reproduce the error with any audio in opus format. This code...

ashinkajay

cherry-picks for 2.3

1

ahmadsharif1

Fixed docs misspelling: netowrk -> network

3

PLEASE NOTE THAT THE TORCHAUDIO REPOSITORY IS NO LONGER ACTIVELY MONITORED. You may not get a response. For open discussions, visit https://discuss.pytorch.org/.

teddy-aisoft

CLA Signed

audio
audio copied to clipboard

Metadata

Update stable symlink to 2.3.0.

Fix encoding for commonvoice.py

Can not load commonvoice dataset on windows

Support for 10bit / 12bit encoding (e.g. yuv420p10le) in StreamWriter

Cannot load audio from pathlib.Path

Using MMS model with `star` token for batch size > 1

Do not use channel_layout in StreamReader

StreamReader seek method seeks to wrong frame for opus format

cherry-picks for 2.3

Fixed docs misspelling: netowrk -> network

← Metadata

Owner

Metadata

audio audio copied to clipboard

Metadata

← Metadata

Owner

Metadata

audio
audio copied to clipboard