autosub icon indicating copy to clipboard operation
autosub copied to clipboard

Can't get speech regions

Open gabriellluz opened this issue 3 years ago • 3 comments

Make sure you have read the readme, searched and read the issues related to yours. Otherwise it will be considered as a duplicate which will be closed immediately.

Describe the bug I'm trying to transcribe a video while using pre-processing function

To Reproduce Steps to reproduce the behavior: I've open a terminal and typed -i "/media/teste.mp4" -ap y -S en-us

  • Command line arguments you are using.Use the following markdown code block syntax is recommended. Copy them into the place between ```.
-i "/media/teste.mp4" -ap y -S en-us
  • A complete copy of command line output of the autosub. You can use Ctrl-A and Ctrl-C to copy all of them.
Input args(without "autosub"): -i "/media/teste.mp4" -ap y -S en-us
/usr/bin/ffmpeg -hide_banner -i "/media/teste.mp4" -vn -af "asplit[a],aphasemeter=video=0,ametadata=select:key=lavfi.aphasemeter.phase:value=-0.005:function=less,pan=1c|c0=c0,aresample=async=1:first_pts=0,[a]amix" -ac 1 -f flac -loglevel error "/tmp/tmpanuinulr.flac"

Use ffprobe to check conversion result.
/usr/bin/ffprobe "/tmp/tmpanuinulr.flac" -show_format -pretty -loglevel quiet
[FORMAT]
filename=/tmp/tmpanuinulr.flac
nb_streams=1
nb_programs=0
format_name=flac
format_long_name=raw FLAC
start_time=0:00:00.000000
duration=0:00:00.116100
size=14.130859 Kibyte
bit_rate=997.071000 Kbit/s
probe_score=100
TAG:major_brand=isom
TAG:minor_version=512
TAG:compatible_brands=isomiso2avc1mp41
TAG:title=teste
TAG:artist=teste
TAG:date=2017
TAG:comment=teste
TAG:encoder=Lavf58.29.100
[/FORMAT]

/usr/bin/ffmpeg -hide_banner -i "/tmp/tmpanuinulr.flac" -af "lowpass=3000,highpass=200" -loglevel error "/tmp/tmpvr_t1lo_.flac"

Use ffprobe to check conversion result.
/usr/bin/ffprobe "/tmp/tmpvr_t1lo_.flac" -show_format -pretty -loglevel quiet
[FORMAT]
filename=/tmp/tmpvr_t1lo_.flac
nb_streams=1
nb_programs=0
format_name=flac
format_long_name=raw FLAC
start_time=0:00:00.000000
duration=0:00:00.116100
size=12.680664 Kibyte
bit_rate=894.745000 Kbit/s
probe_score=100
TAG:major_brand=isom
TAG:minor_version=512
TAG:compatible_brands=isomiso2avc1mp41
TAG:title=teste
TAG:artist=teste
TAG:date=2017
TAG:comment=teste
TAG:encoder=Lavf58.29.100
[/FORMAT]

/home/mestre/.pyenv/versions/3.8.5/bin/ffmpeg-normalize -v "/tmp/tmpvr_t1lo_.flac" -ar 44100 -ofmt flac -c:a flac -pr -p -o "/tmp/tmp19_y75h4.flac"
Stream 1/1: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████| 100/100 [00:00<00:00, 1006.46it/s]
Second Pass: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████| 100/100 [00:00<00:00, 915.39it/s]
File: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00,  4.73it/s]

Use ffprobe to check conversion result.
/usr/bin/ffprobe "/tmp/tmp19_y75h4.flac" -show_format -pretty -loglevel quiet
[FORMAT]
filename=/tmp/tmp19_y75h4.flac
nb_streams=1
nb_programs=0
format_name=flac
format_long_name=raw FLAC
start_time=0:00:00.000000
duration=0:00:00.116100
size=12.829102 Kibyte
bit_rate=905.219000 Kbit/s
probe_score=100
TAG:major_brand=isom
TAG:minor_version=512
TAG:compatible_brands=isomiso2avc1mp41
TAG:title=teste
TAG:artist=teste
TAG:date=2017
TAG:comment=teste
TAG:encoder=Lavf58.29.100
[/FORMAT]

Audio pre-processing complete.
Translation destination language not provided. Only performing speech recognition.
Override "-of"/"--output-files" due to your args too few.
Output source subtitles file only.

Convert source file to "/tmp/tmpd8ys3vqq.wav" to detect audio regions.
/usr/bin/ffmpeg -hide_banner -y -i "/tmp/tmp19_y75h4.flac" -vn -ac 1 -ar 48000 -loglevel error "/tmp/tmpd8ys3vqq.wav"

Use ffprobe to check conversion result.
/usr/bin/ffprobe "/tmp/tmpd8ys3vqq.wav" -show_format -pretty -loglevel quiet
[FORMAT]
filename=/tmp/tmpd8ys3vqq.wav
nb_streams=1
nb_programs=0
format_name=wav
format_long_name=WAV / WAVE (Waveform Audio)
start_time=N/A
duration=0:00:00.116104
size=11.082031 Kibyte
bit_rate=781.919000 Kbit/s
probe_score=99
TAG:artist=teste
TAG:comment=teste
TAG:date=2017
TAG:title=teste
TAG:encoder=Lavf58.29.100
[/FORMAT]

Conversion completed.
Use Auditok to detect speech regions.
Auditok detection completed.
"/tmp/tmpd8ys3vqq.wav" has been deleted.
Error: Can't get speech regions.
Press Enter to exit...

No custom config used.

Environment (please complete the following information):

  • OS: Ubuntu 20.04
  • Python Version: python 3.8.5
  • Autosub Version: latest dev autosub==0.5.7a0

gabriellluz avatar Oct 11 '20 01:10 gabriellluz

Check the volume of your audio file to make sure it's mostly above -20dB or use -k option to keep all the intermediate files and review them.

BingLingGroup avatar Oct 11 '20 05:10 BingLingGroup

But if I don't use the preprocessing option it works. The crash only happens when I use preprocessing. The volume is ok to me.

Installing auditok from their git repo kinda solved the issue.

pip install git+https://github.com/amsehili/auditok Now I get a different message when typing the same command line:

Conversion completed.

Use Auditok to detect speech regions.
Traceback (most recent call last):
  File "/home/mestre/.pyenv/versions/3.8.5/lib/python3.8/site-packages/auditok/util.py", line 1007, in __getattr__
    return getattr(self._audio_source, name)
  File "/home/mestre/.pyenv/versions/3.8.5/lib/python3.8/site-packages/auditok/util.py", line 856, in __getattr__
    return getattr(self._audio_source, name)
  File "/home/mestre/.pyenv/versions/3.8.5/lib/python3.8/site-packages/auditok/util.py", line 736, in __getattr__
    return getattr(self._audio_source, name)
AttributeError: 'BufferAudioSource' object has no attribute 'get_sample_width'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/mestre/.pyenv/versions/3.8.5/bin/autosub", line 33, in <module>
    sys.exit(load_entry_point('autosub==0.5.7a0', 'console_scripts', 'autosub')())
  File "/home/mestre/.pyenv/versions/3.8.5/lib/python3.8/site-packages/autosub/__init__.py", line 159, in main
    cmdline_utils.audio_or_video_prcs(args,
  File "/home/mestre/.pyenv/versions/3.8.5/lib/python3.8/site-packages/autosub/cmdline_utils.py", line 1357, in audio_or_video_prcs
    regions = auditok_utils.auditok_gen_speech_regions(
  File "/home/mestre/.pyenv/versions/3.8.5/lib/python3.8/site-packages/autosub/auditok_utils.py", line 31, in auditok_gen_speech_regions
    sample_width=asource.get_sample_width(),
  File "/home/mestre/.pyenv/versions/3.8.5/lib/python3.8/site-packages/auditok/util.py", line 1009, in __getattr__
    raise AttributeError(
AttributeError: 'AudioReader' has no attribute 'get_sample_width'

gabriellluz avatar Oct 11 '20 09:10 gabriellluz

Alright. The new error attributes to this https://github.com/BingLingGroup/autosub/issues/137#issuecomment-704807384. I will change the https://github.com/BingLingGroup/autosub/blob/dev/setup.py to make sure the user won't install the incompatible version of Auditok.

About the preprocessing opitons, I get them from this script from this issue https://github.com/agermanidis/autosub/issues/40 . And I also mentioned the source or the function of these commands here https://github.com/BingLingGroup/autosub#input .

BingLingGroup avatar Oct 12 '20 05:10 BingLingGroup