Rotem Dan issues

Results 38 issues of


                                            Rotem Dan

Develop web-based UI

Currently, running the server (`echogarden serve`) and opening the local host HTTP page (`http://localhost:45054`) shows a basic placeholder message (`"This is the Echogarden HTTP server!"`) Gradually, start developing a graphical...

feature

future

Finish initial development of browser extension

The browser extension, currently in development, already has the core functionality of communicating with the server, full word highlighting, and being able to speak starting at a selected page element....

feature

future

Investigate porting some of the engines to run in the browser

It is technically possible, overall, since the core components: `espeak-ng` and `onnxruntime` both fully support running in the browser. Actually, `onnxruntime-web`, unlike `onnxruntime-node` (the currently used package), can also make...

feature

future

Finish development of new text language detection engine

The current two engines (`tinyld` and `fasttext`) aren't always accurate and sometime produce odd or nonsensical classifications, like classifying English text as Klingon. I've developed a custom engine, based on...

feature

future

Rules for splitting sentences on punctuation characters in subtitles

Hi, I have a suggestion. When using the transcript alignment function, could the program try to split sentences at commas and periods as much as possible? Sometimes a sentence doesn't...

question

Recognition: add support for OpenAI's cloud Whisper API

OpenAI provides a subscription-based cloud service that is able to transcribe speech using the largest Whisper model (`large-v2`): ``` https://api.openai.com/v1/audio/transcriptions ``` And translate speech using the same model: ``` https://api.openai.com/v1/audio/translations...

recognition

feature

Synthesis: VITS voices have various pronunciation errors that can be fixed using lexicons

In the VITS and eSpeak engines, the text is converted to phonemes using the phoneme events produced by the eSpeak speech synthesizer during synthesis. eSpeak does a reasonable job in...

bug

synthesis

Synthesis: VITS voices have various issues related to model training

For example, when the default English voice (Amy / Low) gets an utterance that is a single word, like "two", it seems to mispronounce it as something that sounds closer...

bug

synthesis

external

CLI: option to control logging verbosity

Not very easy to implement at the moment. May require significant changes in many source files.

enhancement

cli

CLI / `speak-url`, `speak-wikipedia`: support accepting and parsing full Wikipedia article URLs

When a Wikipedia article URL like `https://en.wikipedia.org/wiki/Garden` is given to `speak-url` or `speak-wikipedia`, detect the article's language from the URL, and use the Wikipedia parsing package to get plain text...

enhancement

cli