gexgd0419 comments

Results 114 comments of


                                            gexgd0419

Reading the full-stop at the ends of paragraphs

I am able to reproduce this issue with the current version of Scrivener and the latest version of NaturalVoiceSAPIAdapter. Debugging shows that Scrivener uses the character `\x2029` to separate paragraphs,...

Feature request: Add a settings page for managing installed OCR languages

You can try the [DISM API](https://learn.microsoft.com/en-us/windows-hardware/manufacture/desktop/dism/dism-api-reference?view=windows-11) directly, which I think is what those PowerShell commands use under the hood.

Feature request: Add a settings page for managing installed OCR languages

Here's some code that uses the `Microsoft.Dism` NuGet package to use the DISM APIs. ```c# using Microsoft.Dism; using System.Globalization; namespace TestDismCSharp { internal class Program { static void ProgressCallback(DismProgress progress)...

Feature request: Add a settings page for managing installed OCR languages

Note that by default, it will throw `DismRebootRequiredException` when the operation succeeded but required reboot to take effect. You can also replace: ```c# using (var sess = DismApi.OpenOnlineSession()) ``` with:...

Does not work in various apps like Discord and Thorium Reader

As for now, Chromium/Electron apps are not able to use the voices from this engine. In fact, they cannot use any third-party SAPI5 voice. They only support WinRT/OneCore voices, which...

Does not work in various apps like Discord and Thorium Reader

A temporary workaround is to copy the registry key `HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Speech\Voices\TokenEnums` to `HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Speech_OneCore\Voices\TokenEnums`. You can use the following PowerShell code. Open Windows Powershell with admin privilege, paste the following line and...

建议开发一个 AI TTS的 SAPI5 适配器

我推测离线微软自然语音实际上也使用了 AI 模型。因为合成语音时有一定的 CPU 占用，并且 SDK 文件里有一个名为 `Microsoft.CognitiveServices.Speech.extension.onnxruntime.dll` 的文件，也就是说很可能使用了 onnxruntime。不过好在 SDK 本体很小，并且虽然原版不支持旧版 Windows，通过 YY-Thunks 兼容层 + 魔改导入表的方式，也能在 Windows 7 上运行。虽然目前主要是支持微软系的自然语音（起初也只是因为挖出了密钥想做一个 POC 证明可行性），不过加上其他种类的“自然语音”我觉得也是可以考虑的，毕竟与 SAPI5 框架相关的东西是可以通用的。不过也有一些问题需要研究探讨一下，因为我目前对 AI 语音的了解并不多。首先要说一下，SAPI5 语音属于进程内...

建议开发一个 AI TTS的 SAPI5 适配器

还有一个问题，有些 AI 语音可能不会反馈朗读的每个单词的时间点，也就没法生成 WordBoundary 事件以及 Bookmark 事件，但是一些程序会依赖这一类事件来确定朗读进度。

建议开发一个 AI TTS的 SAPI5 适配器

这些语音对 SSML 和事件的支持如何？它们是如何反馈语音朗读进度的？还是说，它们只能合成语音，不能反馈进度？我希望支持尽可能多的 SAPI5 特性。微软的那几个语音对此的支持还是不错的。Edge 语音不支持书签事件，但是至少还能通过每个单词的时间点来模拟。当然不是所有 SAPI5 特性都是必须的，但总有一些程序会依赖其中几个特性。如果实在没有进度反馈的办法，可能只能把文本拆分成句子，之后逐个句子合成，这样至少还有以句子为单位的进度表示。不过这样也要考虑各种语言的文本应该如何切分句子。

建议开发一个 AI TTS的 SAPI5 适配器

看了一下 [calibre](https://github.com/kovidgoyal/calibre)（一款内置了 piper 语音的电子书管理器）的[实现](https://github.com/kovidgoyal/calibre/blob/4e097f7e4b5a3399051ff36658d0a53cc379e56d/src/calibre/gui2/tts/piper.py#L275)，它是在启动了 piper 进程后，用 [ICU](https://github.com/unicode-org/icu) 库把文本[拆分成句子](https://github.com/kovidgoyal/calibre/blob/4e097f7e4b5a3399051ff36658d0a53cc379e56d/src/calibre/spell/break_iterator.py#L109)，之后逐句合成，以得到以句子为单位的朗读进度的。此外应该是为了保证一定的合成效果，它也会把太短的句子合并，把太长的句子进一步在单词分界处拆分。至于音量、语速和语调调节，看起来也需要自己想办法实现了。