kaldi-offline-transcriber
kaldi-offline-transcriber copied to clipboard
Multi-core, multi-threading - possible?
8-core machine could plow through diarization faster if parallelized - what's the biggest complexity stopping us from having it?
By far the most time-consuming part of speaker diarization is the last step -- NCLR clustering. I don't know if this algorithm is easily parallelizable or not.
However, current speaker recognition models are not highly sensitive to absolutely correct speaker diarization, so you could actually omit NCLR clustering (and gender identification), and use show.spl.seg instead of show.seg as the diarization result. This would save you about 80% of the time.
@alumae thanks. Related to multi-threading ability, I'm also seeing crashes at a later stage:
[982571.380092] nnet3-latgen-fa[9579]: segfault at 0 ip 00007f3469bba1ab sp 00007f34367fbb60 error 6 in libopenblas_openmp_haswellp-r0.2.20.so[7f346996a000+3f5000]
Googling shows that openblas
may have trouble with multithreading (at least w/ openmp
enabled, which I have). Do you happen to have any experience with segfaults in the process? I'm testing running speech2text.sh
with OMP_NUM_THREADS=1
, but not very hopeful for it helping. Should probably test with a small sample audio file, too.
No, I haven't seen this. I usually use Intel's MKL, not OpenBLAS but of course it might not be possible for you.
Note that you can use parallel decoding (instead of multithreaded) if you set e.g. njobs=4 in Makefile.options, but I think then you could run into problems if your audio file has less than njobs
speakers.
Actually, it's probably OK to use parallel decoding even with less than njobs
speakers. But if you have less than njobs
utterances (segments), it could fail.
It seems like eliminating --nthreads
and using OMP_NUM_THREADS=1
worked. I will now transcribe another file with only a single variable. Perhaps it was --nthreads 8
all along.