unilm
unilm copied to clipboard
How to load WavLM ECAPA-TDNN embeddings for Speaker verification ?
According to the WavLM paper: (WavLM: Large-Scale Self-Supervised Pre-Training for Full Stack Speech Processing)
They used ECAPA-TDNN embeddings model for the downstream task: Speaker verification.
I searched but didn't found, is there any implementation which I can used with the model ? (WavLM embeddings which produced by ECAPA-TDNN) ?
For example:
import torch from transformers import Wav2Vec2FeatureExtractor from transformers import WavLMForXVector import soundfile as sf
wav_tensor, sr = sf.read(r"nyfile.wav")
device = "cuda" if torch.cuda.is_available() else "cpu" feature_extractor_wav2vec = Wav2Vec2FeatureExtractor.from_pretrained("microsoft/wavlm-base-plus-sv") model_wav_lm = WavLMForXVector.from_pretrained("microsoft/wavlm-base-plus-sv").to(device) inputs = feature_extractor_wav2vec(wav_tensor,sampling_rate=16000,return_tensors="pt",padding=True).to(device) with torch.no_grad(): embeddings = model_wav_lm(**inputs).embeddings
I didn't saw if the embeddings came from ECAPA-TDNN or from X-Vector.
@amitli1 Did you find any solution for this?
I think the code is available here https://github.com/microsoft/UniSpeech/blob/e3043e2021d49429a406be09b9b8432febcdec73/downstreams/speaker_verification/models/ecapa_tdnn.py but I didn't find any checkpoint for it. A lot of papers are using WavLM-TDNN currently, so I'm not sure what we are missing. It might be available somewhere.
i am also searching for the WavLM-TDNN checkpoint, but find nothing, i think we need to train it by ourself using the superb. however superb didnt have the ecaptdnn code
Any updates?