ipatok icon indicating copy to clipboard operation
ipatok copied to clipboard

Keep stress symbols?

Open dreamk73 opened this issue 1 year ago • 2 comments

Is there a way for the tokeniser to keep the stress symbols in the IPA transcription?

dreamk73 avatar Aug 28 '24 19:08 dreamk73

Not at the moment, but I would be happy to fix this. The question is whether:

  • a stress marker should be a token on its own; for example:

    >>> tokenise('ˈkeˌke')
    ['ˈ', 'k', 'e', 'ˌ', 'k', 'e']
    
  • or it should be combined with the first letter of its syllable; for example:

    >>> tokenise('ˈkeˌke')
    ['ˈk', 'e', 'ˌk', 'e']
    

I assume the first option would make it easier to process the output further, but I have not worked with stress, so if you say otherwise I would implement it accordingly.

pavelsof avatar Sep 02 '24 13:09 pavelsof

I would prefer the first solution where they are tokens on their own.

dreamk73 avatar Sep 11 '24 21:09 dreamk73