SenseVoice
SenseVoice copied to clipboard
Can we generate the transcript including audio events?
the 2nd output token is the event token