SenseVoice
SenseVoice copied to clipboard
emotion prob extract
If I want to obtain the probabilities of 8 emotions:
"<|HAPPY|>", "<|SAD|>", "<|ANGRY|>", "<|NEUTRAL|>", "<|FEARFUL|>", "<|DISGUSTED|>", "<|SURPRISED|>", "<|OTHER|>", should I take the logits at the corresponding token ID positions from the second frame?
for i in range(b):
x = ctc_logits[i, : encoder_out_lens[i].item(), :]
yseq = x.argmax(dim=-1)
emotion_logits=x[1,:]
Can I do this?