audio
audio copied to clipboard
Data manipulation and transformation for audio signal processing, powered by PyTorch
On windows, this defaults to cp1252, an incorrect encoding for this file.
### 🐛 Describe the bug When loading the common voice dataset on windows, the file `train.tsv` is loaded using cp1252 file encoding, leading to a failure. ``` training_speech_dataset = torchaudio.datasets.COMMONVOICE(root=base_dataset_cache_directory)...
### 🚀 The feature The ability to provide 16 bit data (`torch.int16`) as input to `StreamWriter` with the understanding that the data will be truncated to 10/12 bit depending on...
### 🐛 Describe the bug Running the following: ```python import torchaudio from pathlib import Path test_audio_path = Path('test.wav') torchaudio.load(test_audio_path) ``` Produces the following error: ``` Traceback (most recent call last):...
The current implementation assumes batch size is one, when attaching the `star` dimension: https://github.com/pytorch/audio/blob/ea437b31ce316ea3d66fe73768c0dcb94edb79ad/src/torchaudio/pipelines/_wav2vec2/utils.py#L41 However, the underlying Wav2vec model supports batch size greater than one. So this line should instead...
We only care about the number of channels, so no need to create channel_layout. One can directly pass the number of channels to filter. Also int64 channel_layout is a deprecated...
### 🐛 Describe the bug Streamreader `seek` not seeking to correct frame even with `mode='precise'`. Use below code to reproduce the error with any audio in opus format. This code...
PLEASE NOTE THAT THE TORCHAUDIO REPOSITORY IS NO LONGER ACTIVELY MONITORED. You may not get a response. For open discussions, visit https://discuss.pytorch.org/.