Rotem Dan comments

Results 235 comments of


                                            Rotem Dan

Alignment: DTW may give inaccurate results due to silent or non-speech sections

I tested the same input audio with: ``` echogarden align audio.mp3 text.txt --plainText.paragraphBreaks=single ``` And there is definitely a significant improvement. I can see that the timing of each line...

Alignment: DTW may give inaccurate results due to silent or non-speech sections

You haven't made it clear if you saw any improvement, at all? Are you now referring to each subtitle cue appearing too early? or extended after the speech ends? They...

Alignment: DTW may give inaccurate results due to silent or non-speech sections

At 2:26.250, there is a 2 second pause, which is the longest in the audio: ![Screenshot_5](https://github.com/echogarden-project/echogarden/assets/8589488/97ecdb1d-7cf7-459e-9b87-01f523b1cbfd) For whatever reason the DTW alignment matched some of this pause to the beginning...

Alignment: DTW may give inaccurate results due to silent or non-speech sections

The reason the first cue includes the silence is a slightly different. It's because the synthesized reference doesn't have any silence at the beginning, and the way DTW works is...

Alignment: DTW may give inaccurate results due to silent or non-speech sections

In `0.11.12` it now trims individual time ranges to remove preceding or following silence within mapped entries (mapped words, phones) after alignment. Silence detection currently uses a threshold of -40dB...

Alignment: DTW may give inaccurate results due to silent or non-speech sections

You can use the slower `dtw-ra` engine (`--engine=dtw-ra`), which uses speech recognition step, and works much better for audio that has background noise and music. By default it uses the...

Alignment: DTW may give inaccurate results due to silent or non-speech sections

I tried to run `echogarden align 166.wav 166.txt` (with no options) and it looked mostly accurate. By default it converts all line breaks to spaces and then processes it normally....

Alignment: DTW may give inaccurate results due to silent or non-speech sections

The reason I chose the `word` mode not to include punctuation, is that it's derived from timeline entries, which intentionally avoid having punctuation in words to allow these words to...

is there an option to set it to run in the gpu?

The models are loaded via `onnxruntime-node`, which is a node.js binding for [Microsoft's ONNX runtime](https://onnxruntime.ai/). `onnxruntime-node` doesn't currently have GPU support on node.js. This is currently [a working item for...

is there an option to set it to run in the gpu?

GPU support (DirectML and CUDA ONNX providers, and GPU build support for `whisper.cpp`) was added on later versions. Closing.