autosub
autosub copied to clipboard
Can't get speech regions
Make sure you have read the readme, searched and read the issues related to yours. Otherwise it will be considered as a duplicate which will be closed immediately.
Describe the bug I'm trying to transcribe a video while using pre-processing function
To Reproduce
Steps to reproduce the behavior:
I've open a terminal and typed -i "/media/teste.mp4" -ap y -S en-us
- Command line arguments you are using.Use the following markdown code block syntax is recommended. Copy them into the place between ```.
-i "/media/teste.mp4" -ap y -S en-us
- A complete copy of command line output of the autosub. You can use
Ctrl-A
andCtrl-C
to copy all of them.
Input args(without "autosub"): -i "/media/teste.mp4" -ap y -S en-us
/usr/bin/ffmpeg -hide_banner -i "/media/teste.mp4" -vn -af "asplit[a],aphasemeter=video=0,ametadata=select:key=lavfi.aphasemeter.phase:value=-0.005:function=less,pan=1c|c0=c0,aresample=async=1:first_pts=0,[a]amix" -ac 1 -f flac -loglevel error "/tmp/tmpanuinulr.flac"
Use ffprobe to check conversion result.
/usr/bin/ffprobe "/tmp/tmpanuinulr.flac" -show_format -pretty -loglevel quiet
[FORMAT]
filename=/tmp/tmpanuinulr.flac
nb_streams=1
nb_programs=0
format_name=flac
format_long_name=raw FLAC
start_time=0:00:00.000000
duration=0:00:00.116100
size=14.130859 Kibyte
bit_rate=997.071000 Kbit/s
probe_score=100
TAG:major_brand=isom
TAG:minor_version=512
TAG:compatible_brands=isomiso2avc1mp41
TAG:title=teste
TAG:artist=teste
TAG:date=2017
TAG:comment=teste
TAG:encoder=Lavf58.29.100
[/FORMAT]
/usr/bin/ffmpeg -hide_banner -i "/tmp/tmpanuinulr.flac" -af "lowpass=3000,highpass=200" -loglevel error "/tmp/tmpvr_t1lo_.flac"
Use ffprobe to check conversion result.
/usr/bin/ffprobe "/tmp/tmpvr_t1lo_.flac" -show_format -pretty -loglevel quiet
[FORMAT]
filename=/tmp/tmpvr_t1lo_.flac
nb_streams=1
nb_programs=0
format_name=flac
format_long_name=raw FLAC
start_time=0:00:00.000000
duration=0:00:00.116100
size=12.680664 Kibyte
bit_rate=894.745000 Kbit/s
probe_score=100
TAG:major_brand=isom
TAG:minor_version=512
TAG:compatible_brands=isomiso2avc1mp41
TAG:title=teste
TAG:artist=teste
TAG:date=2017
TAG:comment=teste
TAG:encoder=Lavf58.29.100
[/FORMAT]
/home/mestre/.pyenv/versions/3.8.5/bin/ffmpeg-normalize -v "/tmp/tmpvr_t1lo_.flac" -ar 44100 -ofmt flac -c:a flac -pr -p -o "/tmp/tmp19_y75h4.flac"
Stream 1/1: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████| 100/100 [00:00<00:00, 1006.46it/s]
Second Pass: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████| 100/100 [00:00<00:00, 915.39it/s]
File: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00, 4.73it/s]
Use ffprobe to check conversion result.
/usr/bin/ffprobe "/tmp/tmp19_y75h4.flac" -show_format -pretty -loglevel quiet
[FORMAT]
filename=/tmp/tmp19_y75h4.flac
nb_streams=1
nb_programs=0
format_name=flac
format_long_name=raw FLAC
start_time=0:00:00.000000
duration=0:00:00.116100
size=12.829102 Kibyte
bit_rate=905.219000 Kbit/s
probe_score=100
TAG:major_brand=isom
TAG:minor_version=512
TAG:compatible_brands=isomiso2avc1mp41
TAG:title=teste
TAG:artist=teste
TAG:date=2017
TAG:comment=teste
TAG:encoder=Lavf58.29.100
[/FORMAT]
Audio pre-processing complete.
Translation destination language not provided. Only performing speech recognition.
Override "-of"/"--output-files" due to your args too few.
Output source subtitles file only.
Convert source file to "/tmp/tmpd8ys3vqq.wav" to detect audio regions.
/usr/bin/ffmpeg -hide_banner -y -i "/tmp/tmp19_y75h4.flac" -vn -ac 1 -ar 48000 -loglevel error "/tmp/tmpd8ys3vqq.wav"
Use ffprobe to check conversion result.
/usr/bin/ffprobe "/tmp/tmpd8ys3vqq.wav" -show_format -pretty -loglevel quiet
[FORMAT]
filename=/tmp/tmpd8ys3vqq.wav
nb_streams=1
nb_programs=0
format_name=wav
format_long_name=WAV / WAVE (Waveform Audio)
start_time=N/A
duration=0:00:00.116104
size=11.082031 Kibyte
bit_rate=781.919000 Kbit/s
probe_score=99
TAG:artist=teste
TAG:comment=teste
TAG:date=2017
TAG:title=teste
TAG:encoder=Lavf58.29.100
[/FORMAT]
Conversion completed.
Use Auditok to detect speech regions.
Auditok detection completed.
"/tmp/tmpd8ys3vqq.wav" has been deleted.
Error: Can't get speech regions.
Press Enter to exit...
No custom config used.
Environment (please complete the following information):
- OS: Ubuntu 20.04
- Python Version: python 3.8.5
- Autosub Version: latest dev autosub==0.5.7a0
Check the volume of your audio file to make sure it's mostly above -20dB or use -k
option to keep all the intermediate files and review them.
But if I don't use the preprocessing option it works. The crash only happens when I use preprocessing. The volume is ok to me.
Installing auditok from their git repo kinda solved the issue.
pip install git+https://github.com/amsehili/auditok
Now I get a different message when typing the same command line:
Conversion completed.
Use Auditok to detect speech regions.
Traceback (most recent call last):
File "/home/mestre/.pyenv/versions/3.8.5/lib/python3.8/site-packages/auditok/util.py", line 1007, in __getattr__
return getattr(self._audio_source, name)
File "/home/mestre/.pyenv/versions/3.8.5/lib/python3.8/site-packages/auditok/util.py", line 856, in __getattr__
return getattr(self._audio_source, name)
File "/home/mestre/.pyenv/versions/3.8.5/lib/python3.8/site-packages/auditok/util.py", line 736, in __getattr__
return getattr(self._audio_source, name)
AttributeError: 'BufferAudioSource' object has no attribute 'get_sample_width'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/mestre/.pyenv/versions/3.8.5/bin/autosub", line 33, in <module>
sys.exit(load_entry_point('autosub==0.5.7a0', 'console_scripts', 'autosub')())
File "/home/mestre/.pyenv/versions/3.8.5/lib/python3.8/site-packages/autosub/__init__.py", line 159, in main
cmdline_utils.audio_or_video_prcs(args,
File "/home/mestre/.pyenv/versions/3.8.5/lib/python3.8/site-packages/autosub/cmdline_utils.py", line 1357, in audio_or_video_prcs
regions = auditok_utils.auditok_gen_speech_regions(
File "/home/mestre/.pyenv/versions/3.8.5/lib/python3.8/site-packages/autosub/auditok_utils.py", line 31, in auditok_gen_speech_regions
sample_width=asource.get_sample_width(),
File "/home/mestre/.pyenv/versions/3.8.5/lib/python3.8/site-packages/auditok/util.py", line 1009, in __getattr__
raise AttributeError(
AttributeError: 'AudioReader' has no attribute 'get_sample_width'
Alright. The new error attributes to this https://github.com/BingLingGroup/autosub/issues/137#issuecomment-704807384. I will change the https://github.com/BingLingGroup/autosub/blob/dev/setup.py to make sure the user won't install the incompatible version of Auditok.
About the preprocessing opitons, I get them from this script from this issue https://github.com/agermanidis/autosub/issues/40 . And I also mentioned the source or the function of these commands here https://github.com/BingLingGroup/autosub#input .