whisper_normalizer
whisper_normalizer copied to clipboard
A python package for whisper normalizer
**Is your feature request related to a problem? Please describe.** - Abhijit Neil Abhraham, suggested to dockerize our python package, for easy commercial usage. data:image/s3,"s3://crabby-images/7e33e/7e33e0657b0eac0484c608549e196a72e44290e0" alt="WhatsApp Image 2024-02-24 at 01 21...
**Describe the bug** - Github actions failing. ![image"
- Include cleanup module before actual normalization - Include non-standard spelling normalization - How to deal with numbers, spell out cardinals, ordinals, abbreviations etc.
The whisper `BasicTextnormalizer` - https://kurianbenoy.github.io/whisper_normalizer/basic.html#basictextnormalizer seems to be widely proven to be a in almost all multi-lingual languages except English. Checkout: https://github.com/huggingface/transformers/issues/20703 In Malyalam and most of Indic languages, it's...
- Add detailed testcases - Add and test how to use normalizer in each of Indian languages
Hi I tried using your normalizer for help in calculating WER for my personal use case. I have a ground truth like so: ``` JUNE THIRD EIGHTEEN SEVENTY ONE OBOCOCK...
You folks might be interested in https://github.com/google-research/nisaba which has normalization rules for a lot of Indic languages, they have taken an FST based approach to normalization. This is just an...