gruut icon indicating copy to clipboard operation
gruut copied to clipboard

Issue with <phoneme> SSML tag

Open Bharath-Kumar-3231 opened this issue 4 years ago • 3 comments

Hi, I am trying to use the phoneme tag in SSML text, but the output phones are split up

from gruut import sentences

ssml_text = """
  <phoneme ph="aːˈb" alphabet="ipa">ab</phoneme>"""

for sent in sentences(ssml_text, ssml=True,espeak=True):
  for word in sent:
    if word.phonemes:
      print(word.text,word.phonemes)

The following is the output obtained

ab ['a', 'ː', 'ˈ', 'b']

but the expected output phones for word ab are

ab ['aː', 'ˈb']

i only used the steps mentioned in documentation, Am i doing something wrong? or is it an issue with gruut in itself.

Thank you

Bharath-Kumar-3231 avatar Nov 25 '21 11:11 Bharath-Kumar-3231

According to the documentation gruut is supposed to intelligently split the given ipa phoenems, but that isnt happening

<phoneme ph="..."> - supply phonemes for inner text alphabet - if ipa, phonemes are intelligently split ("aːˈb" -> "aː", "ˈb")

Bharath-Kumar-3231 avatar Dec 03 '21 07:12 Bharath-Kumar-3231

Hi @Bharath-Kumar-3231, this is a good question. I originally did the "intelligent" split on phonemes, but had problems when the system using gruut needed them split differently -- for example, seeing as a single phoneme.

So now you can separate phonemes by whitespace and gruut will respect that:

<phoneme ph="aː ˈb" alphabet="ipa">ab</phoneme>

should produce the expected output.

Let me know what your thoughts are. I felt this was my only option since some TTS systems consider elongation ː and primary stress ˈ as separate "phonemes".

synesthesiam avatar Dec 03 '21 15:12 synesthesiam

The phoneme tag doesn't behave as expected. I.e. continuing the example above,

from gruut import sentences

ssml_text = """
  <phoneme ph="aːˈb" alphabet="ipa">ab</phoneme>"""

for sent in sentences(ssml_text, ssml=True,espeak=True):
  for word in sent:
    if word.phonemes:
      print(word.text,word.phonemes)

prints ab ['ˈ', 'b'] instead of the expected ['aː', 'ˈb'].

liaeh avatar Sep 27 '22 10:09 liaeh