NaturalVoiceSAPIAdapter icon indicating copy to clipboard operation
NaturalVoiceSAPIAdapter copied to clipboard

Reading the full-stop at the ends of paragraphs

Open HenryLoenwind opened this issue 10 months ago • 6 comments

Using local voice Sonia on Windows 10, I find that it reads the full-stop at the end of paragraphsDOT

This is mildly annoying and not something that happens with the standard Windows voicesDOT

It doesn't happen when a paragraph ends with a ?, ", or something like that. It also doesn't happen in the middle of a paragraph or with single-line inputs (like the preview in the system settings). I suspect this is an artefact of the way the text is transformed?

HenryLoenwind avatar Feb 05 '25 19:02 HenryLoenwind

What is the application you are using to read the text?

gexgd0419 avatar Feb 06 '25 01:02 gexgd0419

What is the application you are using to read the text?

Scrivener. http://www.literatureandlatte.com

PS: You can try this out directly from Edit>Speech>Settings:

Image

The test paragraph there ends in a full-stopDOT

HenryLoenwind avatar Feb 09 '25 07:02 HenryLoenwind

Update. I just remembered there was a loglevel...

Image

Using a hex editor:

Image

or:

Image

Sadly, I only see the SSML, not the input data. I'd say this is caused by the paragraph breaks being converted into, um, something. However, I cannot say if Scrivener does or if that's part of the SAPI-SSML conversion. However, as this character works with SAPI voices, I'd say it needs to be converted into something that doesn't trigger the preceding "." to be read, e.g. by adding a space character.

(EDIT: When I said "works" above, I meant that SAPI sees the character as whitespace, not that it speaks it as a proper paragraph break. With the second example, SAPI still says "twelveThree", instead of "twelve. Three" as a human would.)

As a luxury feature, replacing it with a short pause (e.g. <break time="150ms"/>) would be nice, as paragraphs running into each other has always annoyed me.

HenryLoenwind avatar Feb 12 '25 20:02 HenryLoenwind

PS: There's another small issue: It seems changing the loglevel disables logging completely until the host application is restarted. I haven't run further tests into that, though.

HenryLoenwind avatar Feb 12 '25 20:02 HenryLoenwind

I am able to reproduce this issue with the current version of Scrivener and the latest version of NaturalVoiceSAPIAdapter.

Debugging shows that Scrivener uses the character \x2029 to separate paragraphs, instead of the more common \n or \r\n. \x2029 is actually the "Paragraph Separator" character, but as the voice doesn't recognize this character, it is ignored, and now there's no separator between the period and the next paragraph, so it is pronounced as a "dot".

I have no idea why Scrivener does this. Using \x2029 is semantically correct, but in practice, the system's built-in robotic voices don't recognize this character either, and will also pronounce "dot" in this case.

More interestingly, in the Text to Speech Settings dialog, if you put the caret at the end of the text and click Speak, then lines will be separated by \n and will be spoken correctly. If you put the caret in the middle of the text, only the part after the caret will be spoken, and lines will be separated by \x2029 in this case.

Since it's just a Unicode character that separates paragraphs (also there's \x2028 which is a line separator), I guess that replacing them with \n would be fine.

gexgd0419 avatar Aug 08 '25 13:08 gexgd0419

I guess they are using those because the underlying format is RTF, which supports line feeds and paragraphs as two distinct things. Like in html, where you have <br> and </p>. As this then needs to be preserved internally until the output transformation, using those Unicode codepoints makes sense.

HenryLoenwind avatar Oct 16 '25 00:10 HenryLoenwind