SenseVoice icon indicating copy to clipboard operation
SenseVoice copied to clipboard

emotion prob extract

Open JingRH opened this issue 7 months ago • 0 comments

If I want to obtain the probabilities of 8 emotions:

"<|HAPPY|>", "<|SAD|>", "<|ANGRY|>", "<|NEUTRAL|>", "<|FEARFUL|>", "<|DISGUSTED|>", "<|SURPRISED|>", "<|OTHER|>", should I take the logits at the corresponding token ID positions from the second frame?

    for i in range(b):
        x = ctc_logits[i, : encoder_out_lens[i].item(), :]
        yseq = x.argmax(dim=-1)

        
        emotion_logits=x[1,:]

Can I do this?

JingRH avatar Jun 06 '25 03:06 JingRH