gruut icon indicating copy to clipboard operation
gruut copied to clipboard

add slovak (sk) language

Open neurlang opened this issue 2 years ago • 0 comments

I would like to suggest adding the dataset.txt of 24865 slovak words, these are hand reviewed. What license would be preferrable to the gruut project? I am the author, can release it under any license you prefer.

https://github.com/neurlang/toipa/tree/master/sk2ipa

Fixes which would be needed:

  1. remove the ' character
  2. replace θ to c
  3. add spaces between phonemes
  4. remove words which map to the A / F placeholder

Then they would be loaded into the lexicon.db word_phonemes table.

What is g2p_alignments table for?

I can also generate a larger dictionary using the neural network (up to 300k words) but these could contain mistakes.

neurlang avatar Nov 11 '23 11:11 neurlang