fstalign icon indicating copy to clipboard operation
fstalign copied to clipboard

Filler words

Open naymaraq opened this issue 3 years ago • 3 comments

Hi

Do you plan to add a flag to disable filler words (like um, uh)?

naymaraq avatar May 11 '22 07:05 naymaraq

We may add that flag eventually, but it is not on the immediate plan. For now we just remove any unwanted tokens from the transcript themselves.

qmac avatar May 11 '22 15:05 qmac

@qmac In paper (https://arxiv.org/pdf/2104.11348v3.pdf), the reported WER is 11.3. Does this include filler words? Is there any script that I can use to reproduce paper result using Rev .nlp output files (https://github.com/revdotcom/speech-datasets/tree/main/earnings21/output/rev) ?

naymaraq avatar May 12 '22 07:05 naymaraq

@naymaraq Yes it does include filler words. Let me see if we can find that script.

qmac avatar May 12 '22 19:05 qmac