speech-aligner icon indicating copy to clipboard operation
speech-aligner copied to clipboard

Did not successfully decode file BAC009S0002W0125, len = 629

Open haha010508 opened this issue 5 years ago • 6 comments

这是个什么问题?

haha010508 avatar Aug 23 '19 07:08 haha010508

同问,也遇到这么个问题 image

lidianxiang avatar Oct 16 '19 03:10 lidianxiang

我也是遇到这个问题

HW140701 avatar Oct 22 '19 03:10 HW140701

@HW140701 对于这个问题,我在montreal-forced-aligner中看到过一个解决办法:逐渐调大beam的值,直至合适为止,可以得到textgrid文件

lidianxiang avatar Oct 26 '19 11:10 lidianxiang

I install the repo successful, but I meet the error as follows. when use it. Do you know how to solve it?

/bin/speech-aligner --config=egs/cn_phn/conf/align.conf egs/cn_phn/data/wav.scp egs/cn_phn/data/text egs/cn_phn/data/out.ali ERROR (speech-aligner[5.4.215~4-f2b7]:Input():util/kaldi-io.cc:756) Error opening input stream res/tree

[ Stack-Trace: ] kaldi::MessageLogger::HandleMessage(kaldi::LogMessageEnvelope const&, char const*) kaldi::MessageLogger::~MessageLogger() kaldi::Input::Input(std::__cxx11::basic_string<char, std::char_traits, std::allocator > const&, bool*) main __libc_start_main _start

My setting : ubuntu16.04 cmake 3.9.1

image image

WhiteFu avatar Dec 11 '19 13:12 WhiteFu

@HW140701 对于这个问题,我在montreal-forced-aligner中看到过一个解决办法:逐渐调大beam的值,直至合适为止,可以得到textgrid文件

This works for me. Experiment as follows:

  1. merge two sample files with ffmpeg

ffmpeg -i BAC009S0002W0122.wav -i BAC009S0002W0123.wav -filter_complex '[0:0][1:0]concat=n=2:v=0:a=1[out]' -map '[out]' merged.wav

  1. create a new playlist called merged.lst with content:

merged merged.wav

  1. also create a merged transcript called merged.txt

  2. in run.sh, execute the following script

speech-aligner --config=conf/align.conf merged.lst merged.txt merged.out

(this should fail)

  1. now edit align.conf, set:

--beam=40 --retry-beam=80

(now it should work)

windy32 avatar Mar 31 '20 09:03 windy32

I also tested another audio file of 49 seconds. In order to finish align, the beam parameter has to be increased to 10240, and it runs much slower.

I guess that's why input audio must be a play list. By design the aligner is intended to process a list of sentences, each in a separate audio file, in which case a beam of 20 or 40 should be enough.

windy32 avatar Mar 31 '20 10:03 windy32