commonvoice-utils
commonvoice-utils copied to clipboard
Linguistic processing for Common Voice
This PR is to fix #51 - Adds a `quiet` parameter to main functions for batch processing, defaults to `False` - Some formatting and pylint suggestions
If you are using the code directly in Python, you still get "Function not implemented" errors. I do check the existing functionality before calling them, but this time, when using...
I would like to be able to count syllables and segment words by syllables, e.g. in the word: caltlamachtiloyan → cal·tla·mach·til·oy·an camioneta → ca·mio·ne·ta
- [ ] Phon - [ ] Valid - [ ] Alphabet - [ ] Segment
Is it possible to implement optional "--exclude-xxx fn" flags to exclude recordings during cv export? ``` --exclude-voices voices.txt // E.g. to measure the effect of a single person recording too...
Although Korean is not fully enabled on Common Voice yet, it only lacks 1500 sentences. If added, we can start using alphabet/normalization support provided by covo.
Perhaps something like `PASS` to basically return whatever was input and `REPL` for removing punctuation. Another option would be something like `CB` for check Unicode Block.
Either something like [thai segmenter](https://pypi.org/project/thai-segmenter/) or maybe [sentence piece](https://github.com/google/sentencepiece).
https://github.com/kscanne/filiocht