SpeechT5
SpeechT5 copied to clipboard
Does the pre-trained model for hidden unit tokenizer use speaker embeddings?
Can you please elaborate on the role of speaker embeddings in the hidden unit tokenizer and what effect it has?