gruut
gruut copied to clipboard
add slovak (sk) language
I would like to suggest adding the dataset.txt of 24865 slovak words, these are hand reviewed. What license would be preferrable to the gruut project? I am the author, can release it under any license you prefer.
https://github.com/neurlang/toipa/tree/master/sk2ipa
Fixes which would be needed:
- remove the ' character
- replace θ to c
- add spaces between phonemes
- remove words which map to the A / F placeholder
Then they would be loaded into the lexicon.db word_phonemes table.
What is g2p_alignments table for?
I can also generate a larger dictionary using the neural network (up to 300k words) but these could contain mistakes.