ipatok
ipatok copied to clipboard
Keep stress symbols?
Is there a way for the tokeniser to keep the stress symbols in the IPA transcription?
Not at the moment, but I would be happy to fix this. The question is whether:
-
a stress marker should be a token on its own; for example:
>>> tokenise('ˈkeˌke') ['ˈ', 'k', 'e', 'ˌ', 'k', 'e'] -
or it should be combined with the first letter of its syllable; for example:
>>> tokenise('ˈkeˌke') ['ˈk', 'e', 'ˌk', 'e']
I assume the first option would make it easier to process the output further, but I have not worked with stress, so if you say otherwise I would implement it accordingly.
I would prefer the first solution where they are tokens on their own.