OpenSeq2Seq
OpenSeq2Seq copied to clipboard
WER is very high for telephone audios
WER is very high for phone recordings. Could you please help us to improve the accuracy of S2T.
What dataset do you use?
Hi Boris,
I use phone call recording from a New Zealand accent and the recordings are with 8000HZ. In order to improve the performance, I used the original language model mentioned in the repo, it increases the accuracy a little bit. but takes ages to process a single sample.
Cheers, Sunny
On Wed, Mar 6, 2019 at 7:50 AM Boris Ginsburg [email protected] wrote:
What dataset do you use?
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/NVIDIA/OpenSeq2Seq/issues/365#issuecomment-469811613, or mute the thread https://github.com/notifications/unsubscribe-auth/AZx9AKd4_FlQw-Ph5rP5-6vJ-E0llKxnks5vTryAgaJpZM4bbYpm .
What is the value of your WER? Do you train your own model with your data?
Hi Cabriel,
My WER is around 40% which is strangely high. And when it comes to the end of the transcript, there is not space between words, and the inference time is as long as more than half an hour.
Thank a lot
On Mon, Mar 18, 2019 at 9:40 PM GabrielLin [email protected] wrote:
What is the value of your WER? Do you train your own model with your data?
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/NVIDIA/OpenSeq2Seq/issues/365#issuecomment-473817898, or mute the thread https://github.com/notifications/unsubscribe-auth/AZx9AOaQiaW9cjLP32QTJCVNJBSTSE_3ks5vX1EXgaJpZM4bbYpm .
I don't think so you will get good results without fine-tuning your model.