RE-VERB
RE-VERB copied to clipboard
What is an "utterance" in this case?
So the data's shape is (speaker, utterance, log filterbanks) and the output is (speaker, utterance, embeddings). What is utterance in this case?