Rotem Dan

Results 235 comments of Rotem Dan

I tested the same input audio with: ``` echogarden align audio.mp3 text.txt --plainText.paragraphBreaks=single ``` And there is definitely a significant improvement. I can see that the timing of each line...

You haven't made it clear if you saw any improvement, at all? Are you now referring to each subtitle cue appearing too early? or extended after the speech ends? They...

At 2:26.250, there is a 2 second pause, which is the longest in the audio: ![Screenshot_5](https://github.com/echogarden-project/echogarden/assets/8589488/97ecdb1d-7cf7-459e-9b87-01f523b1cbfd) For whatever reason the DTW alignment matched some of this pause to the beginning...

The reason the first cue includes the silence is a slightly different. It's because the synthesized reference doesn't have any silence at the beginning, and the way DTW works is...

In `0.11.12` it now trims individual time ranges to remove preceding or following silence within mapped entries (mapped words, phones) after alignment. Silence detection currently uses a threshold of -40dB...

You can use the slower `dtw-ra` engine (`--engine=dtw-ra`), which uses speech recognition step, and works much better for audio that has background noise and music. By default it uses the...

I tried to run `echogarden align 166.wav 166.txt` (with no options) and it looked mostly accurate. By default it converts all line breaks to spaces and then processes it normally....

The reason I chose the `word` mode not to include punctuation, is that it's derived from timeline entries, which intentionally avoid having punctuation in words to allow these words to...

The models are loaded via `onnxruntime-node`, which is a node.js binding for [Microsoft's ONNX runtime](https://onnxruntime.ai/). `onnxruntime-node` doesn't currently have GPU support on node.js. This is currently [a working item for...

GPU support (DirectML and CUDA ONNX providers, and GPU build support for `whisper.cpp`) was added on later versions. Closing.