aeneas icon indicating copy to clipboard operation
aeneas copied to clipboard

multi-level alignment with "task_adjust_boundary_nonspeech_min"

Open bwang482 opened this issue 5 years ago • 2 comments

I want to alignment my audio recording files with corresponding transcripts. There are a lot of pauses and silence in my audios. I want multi-level alignment (mainly word-level and segment/paragraph-level) as well as alignment for the pauses and silence. It is important for me to know how long or how short the inter-segment pauses are. However when I use the command below, it detects no pauses/silence for between segments. While if I use is_text_type=plain instead of mplain, I receive the alignment for those inter-segment pauses (as well as the segments).

python -m aeneas.tools.execute_task sample_audio.mp3 sample_audio_transcript.txt "task_language=eng|os_task_file_format=json|is_text_type=mplain|task_adjust_boundary_nonspeech_min=0.0100|task_adjust_boundary_nonspeech_string=(sil)|task_adjust_boundary_algorithm=auto" sample_audio_output.multilevel.json

Why?

bwang482 avatar Oct 20 '19 23:10 bwang482

To be honest, on top of my mind I cannot answer. It might be a limitation of the current implementation of multilevel. I would need to check the code.

pettarin avatar Jan 22 '20 21:01 pettarin

I want to alignment my audio recording files with corresponding transcripts. There are a lot of pauses and silence in my audios. I want multi-level alignment (mainly word-level and segment/paragraph-level) as well as alignment for the pauses and silence. It is important for me to know how long or how short the inter-segment pauses are. However when I use the command below, it detects no pauses/silence for between segments. While if I use is_text_type=plain instead of mplain, I receive the alignment for those inter-segment pauses (as well as the segments).

python -m aeneas.tools.execute_task sample_audio.mp3 sample_audio_transcript.txt "task_language=eng|os_task_file_format=json|is_text_type=mplain|task_adjust_boundary_nonspeech_min=0.0100|task_adjust_boundary_nonspeech_string=(sil)|task_adjust_boundary_algorithm=auto" sample_audio_output.multilevel.json

Why?

What is the structure of your transcriptions?

Is it

Lorem Ipsum is simply dummy text of the printing and typesetting industry

or

Lorem 
Ipsum 
is 
simply
dummy 
text 
of 
the 
...

lokesh1199 avatar Aug 08 '23 05:08 lokesh1199