soundstream-pytorch
soundstream-pytorch copied to clipboard
Unofficial SoundStream implementation of Pytorch with training code and 16kHz pretrained checkpoint
SoundStream for Pytorch
Unofficial SoundStream implementation of Pytorch with training code and 16kHz pretrained checkpoint.
16kHz pretrained model was trained on LibriSpeech train-clean-100 with NVIDIA T4 for about 150 epochs (around 50 hours) in total. The model is not causal.
import torchaudio
import torch
model = torch.hub.load("kaiidams/soundstream-pytorch", "soundstream_16khz")
x, sr = torchaudio.load('input.wav')
x, sr = torchaudio.functional.resample(x, sr, 16000), 16000
with torch.no_grad():
y = model.encode(x)
# y = y[:, :, :4] # if you want to reduce code size.
z = model.decode(y)
torchaudio.save('output.wav', z, sr)
sample audio
Audio references are sampled from LibriSpeech test-clean.
| Reference | SoundStream |
|---|---|
| audio link | audio link |
| audio link | audio link |
| audio link | audio link |