icefall Prompt ASR

Prompt ASR

Open marcoyang1998 opened this issue 2 years ago • 3 comments

This PR is only for illustration purposes and is not for merge.

It implements prompt-ASR, where the model receives not only speech but also content&style prompts.

The content prompt could be the pre-text associated with the current sentence and the style prompt is the desired style of the output transcript (e.g mixed-case with punctuation, all uppercase without punctuation ...). Both prompts should be in text format.

May 26 '23 03:05 marcoyang1998

@marcoyang1998 Great job! Is there any new progress？

Jul 25 '23 07:07 pingfengluo

@pingfengluo Hi, we've done a lot of experiments and the results are promising. Given the pre_text (e.g. transcriptions from the previous utterances), we are able to get 5-10% relative WERR on our new dataset libriheavy (#1175). The model can also change the style of the output given a "style prompt".

Once we finalize the model, we will update here.

Jul 25 '23 07:07 marcoyang1998

@marcoyang1998 cool! Looking forward to your results. Speech prompt has great potential in solving multiple tasks related to speech, including speech recognition, punctuation prediction, time stamp prediction and so on, like whisper. And， Speech prompt could be a way to support large-scale multimodal models that integrate speech and text, some things like audioPaLM（or speech gpt）.

Jul 25 '23 08:07 pingfengluo

icefall icefall copied to clipboard

Prompt ASR

icefall
icefall copied to clipboard