icefall
icefall copied to clipboard
Prompt ASR
This PR is only for illustration purposes and is not for merge.
It implements prompt-ASR, where the model receives not only speech but also content&style prompts.
The content prompt could be the pre-text associated with the current sentence and the style prompt is the desired style of the output transcript (e.g mixed-case with punctuation, all uppercase without punctuation ...). Both prompts should be in text format.
@marcoyang1998 Great job! Is there any new progress?
@pingfengluo Hi, we've done a lot of experiments and the results are promising. Given the pre_text (e.g. transcriptions from the previous utterances), we are able to get 5-10% relative WERR on our new dataset libriheavy (#1175). The model can also change the style of the output given a "style prompt".
Once we finalize the model, we will update here.
@marcoyang1998 cool! Looking forward to your results. Speech prompt has great potential in solving multiple tasks related to speech, including speech recognition, punctuation prediction, time stamp prediction and so on, like whisper. And, Speech prompt could be a way to support large-scale multimodal models that integrate speech and text, some things like audioPaLM(or speech gpt).