unilm icon indicating copy to clipboard operation
unilm copied to clipboard

How to load WavLM ECAPA-TDNN embeddings for Speaker verification ?

Open amitli1 opened this issue 1 year ago • 3 comments

According to the WavLM paper: (WavLM: Large-Scale Self-Supervised Pre-Training for Full Stack Speech Processing)

They used ECAPA-TDNN embeddings model for the downstream task: Speaker verification.

I searched but didn't found, is there any implementation which I can used with the model ? (WavLM embeddings which produced by ECAPA-TDNN) ?

For example:

import torch from transformers import Wav2Vec2FeatureExtractor from transformers import WavLMForXVector import soundfile as sf

wav_tensor, sr = sf.read(r"nyfile.wav")

device = "cuda" if torch.cuda.is_available() else "cpu"
feature_extractor_wav2vec = Wav2Vec2FeatureExtractor.from_pretrained("microsoft/wavlm-base-plus-sv")
model_wav_lm = WavLMForXVector.from_pretrained("microsoft/wavlm-base-plus-sv").to(device)

inputs = feature_extractor_wav2vec(wav_tensor,sampling_rate=16000,return_tensors="pt",padding=True).to(device)
with torch.no_grad():
    embeddings = model_wav_lm(**inputs).embeddings

I didn't saw if the embeddings came from ECAPA-TDNN or from X-Vector.

amitli1 avatar Nov 19 '23 11:11 amitli1

@amitli1 Did you find any solution for this?

I think the code is available here https://github.com/microsoft/UniSpeech/blob/e3043e2021d49429a406be09b9b8432febcdec73/downstreams/speaker_verification/models/ecapa_tdnn.py but I didn't find any checkpoint for it. A lot of papers are using WavLM-TDNN currently, so I'm not sure what we are missing. It might be available somewhere.

Edresson avatar Dec 29 '23 19:12 Edresson

i am also searching for the WavLM-TDNN checkpoint, but find nothing, i think we need to train it by ourself using the superb. however superb didnt have the ecaptdnn code

chenyang399 avatar Apr 19 '24 07:04 chenyang399

Any updates?

maxpain avatar May 21 '24 10:05 maxpain