kaldi icon indicating copy to clipboard operation
kaldi copied to clipboard

On the fly contextual adaptation,

Open KarelVesely84 opened this issue 4 years ago • 18 comments

adding code for doing 'lattice boosting' and 'HCLG boosting' to do on-the fly contextual adaptation by WFST composition

  • this PR contains the code to do the composition, each utterance can be adapted with different FST
  • this PR does not contain the code to prepare the WFST 'adaptation graphs',
  • for HCLG boosting the graph should be simple (single state graph), for more complicated boosting graphs the HCLG o B composition is too slow...

KarelVesely84 avatar Jun 18 '21 15:06 KarelVesely84

This is a super interesting observation, thanks @vesis84!

        //       OpenFst docs say that more specific iterators
        //       are faster than generic iterators. And in HCLG
        //       is usually loaded for decoding as ConstFst.
        //
        //       auto decode_fst_ = ConstFst<StdArc>(decode_fst);
        //
        //       In this way, I tried to cast VectorFst to ConstFst,
        //       but this made the decoding 20% slower.

Very related to #4564. While we're trying to squeeze 5% out of it...

kkm000 avatar Jun 18 '21 19:06 kkm000

Hm....This is interesting.

IIRC, the "ConstFst" is actually faster to read ,write and access than "VectorFast" as it has less fragmentation, so the "ReadFstKaldiGeneric" related functions are added and cast the "VectorFst" HCLG to "ConstFst" one when the graph is built. I checked my old emails and found the following paragraph which is forwarded by Dan, someone said "We find that const FSTs load from storage significantly faster than vector FSTs. They also take up less space in memory. Const FSTs also speed up decoding by a measurable amount, as they seem to traverse more quickly. Curiously, we see that const FSTs take up approximately 10% more space on disk when compared to vector FSTs."

For the specific iterators are faster than generic iterators, I remembered that's why the codes as follows are added

if (fst_->Type() == "const") {
  LatticeFasterDecoderTpl<fst::ConstFst<fst::StdArc>, Token> *this_cast =
    reinterpret_cast<LatticeFasterDecoderTpl<fst::ConstFst<fst::StdArc>, Token>* >(this);
  this_cast->AdvanceDecoding(decodable, max_num_frames);
....

LvHang avatar Jun 19 '21 05:06 LvHang

@LvHang, AFAICR, ConstFst is not transformed in any way when loaded; it can be directly mapped with mmap(2) into the address space. This means it has offsets there the VectorFst has pointers. Offset arithmetics may impose certain overhead, although the 20% difference is hard to explain by it. Possibly, a loss of cache locality is at play here. A good profiler (Intel VTune is freely available now), when best of all running on real hardware (even if a VM has a vPMU, I've personally seen a lot of glitches with these; without PMU, this statistics is unavailable) will collect reliable cache miss statistics.

Also, it's important to do the comparison at the optimization turned to 11: A recent compiler, like GCC 10.2 or Clang 11 or 12, -O3 is mandatory (especially with GCC, as -O2 does not imply loop unrolling, unlike clang), and all architecture advantages taken (e.g., if you run it on Skylake-X Xeon with AVX512, specify -march=skylake-avx512; the default -msse2 does not cut it at all).

kkm000 avatar Jun 19 '21 06:06 kkm000

this PR does not contain the code to prepare the WFST 'adaptation graphs',

Is it a good idea to merge this PR as is without an example WFST that this is known to work with? Think about how there was an example recipe with the lookahead decoding recipe PR.

@LvHang @kkm000 I suspect something much, much simpler is going on. FST Composition creates a VectorFst as output, and that VectorFst is used only once (to decode a single utterance). Why convert to ConstFst if you're only going to run the Fst just once? Keep in mind that decoding with FSTs typically reads <10% of the FST states, so the overhead of conversion (with touches 100% of states and 100% of arcs) is not worth it here.

galv avatar Jun 19 '21 16:06 galv

@galv, you're almost certainly right.

@vesis84, only the profiler knows the ultimate truth. When I profile a piece of sluggish code for the first time, I much more often get surprises 🎉🎉 than not. Like, InfoZip's unzip spends 90% of decompression time in libz's crc32() function. Who's expect that!

kkm000 avatar Jun 21 '21 06:06 kkm000

Hi, thank you for the feedback. I adopted some of the suggestions.

Some other suggestion are pointing into the 'old' code that was taken, copied and modified. I don't think we should deviate from the style of that original code too much, it would be good if it stays possible to diff against the original code.

Other thing is that, sometimes the code seems to be 'over-explicit', and this is for a good reason of simple readability. It might annoy some people, but also help other people. So it is difficult to decide about the shortenings. I try to stick with the style of other code as much as possible.

I'll process your inputs and re-open later. Thanks Karel

KarelVesely84 avatar Jun 21 '21 15:06 KarelVesely84

@vesis84

sometimes the code seems to be 'over-explicit', and this is for a good reason of simple readability. It might annoy some people, but also help other people.

Readability = good. I have not seen anyone annoyed because he had too little trouble reading code :) I've been annoyed by unreadable code myself. Unless you are referring to the comment about a stack object destroyed at end of scope, which is really excessive, I'd say. One needs to know basic C++ to read code.

I don't think we should deviate from the style of that original code too much

I think that we should do what we say we do and follow the coding style, even though we have old code which doesn't, and fix the old code when we touch it. Or stop saying that we have the standard. Being of two minds in this respect is the worst.

One's logical order of includes is another's illogical order. Alphabetical order is unambiguous. Imagine I edit something unrelated in this file and also casually rearrange includes because I think my order is more logical. You then rearrange them again. Last thing I want to see is the battle of commits. Unless you have any idea how to unambiguously standardize the "logical order," it's better to stick tight to written guidelines.

One day I'll unleash clang-format on the codebase, and it will ultimately squish all style arguments. Which is a good thing, as I want to spend the first day of the rest of my life kaldying, not arguing about our coding style. :)

kkm000 avatar Jun 22 '21 18:06 kkm000

that ClassifierRspecifier() call has no side effects, at least if last 2 args are null, so that should be OK.

On Wed, Jun 23, 2021 at 3:04 AM kkm000 @.***> wrote:

@.**** commented on this pull request.

In src/nnet3bin/nnet3-latgen-faster-compose.cc https://github.com/kaldi-asr/kaldi/pull/4571#discussion_r656503586:

  • if (word_syms_filename != "") {
  •  word_syms.reset(fst::SymbolTable::ReadText(word_syms_filename));
    
  •  if (!word_syms)
    
  •    KALDI_ERR << "Could not read symbol table from file "
    
  •              << word_syms_filename;
    
  • }
  • double tot_like = 0.0;
  • kaldi::int64 frame_count = 0;
  • int num_success = 0, num_fail = 0;
  • // this compiler object allows caching of computations across
  • // different utterances.
  • CachingOptimizingCompiler compiler(am_nnet.GetNnet(),
  •                                   decodable_opts.optimize_config);
    
  • KALDI_ASSERT(ClassifyRspecifier(hclg_fst_rxfilename, NULL, NULL) == kNoRspecifier);

Oh. This is a bug. Don't put any code inside the ASSERT. It compiles to nothing if NDEBUG is defined.

https://github.com/kaldi-asr/kaldi/blob/9d235864c3105c3b72feb9f19a219dbae08b3a41/src/base/kaldi-error.h#L184-L194

— You are receiving this because your review was requested. Reply to this email directly, view it on GitHub https://github.com/kaldi-asr/kaldi/pull/4571#discussion_r656503586, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAZFLO3IHP2XVO43JZTANVTTUDNCLANCNFSM4653Y6WA .

danpovey avatar Jun 23 '21 06:06 danpovey

Okay, i give up for the moment. And Dan's comment about extending lattice-compose is right. I'll work on it later in a separate branch. And working contextual adaptation on lattice-level will be a good start.

One more thing, when i was going through Google papers, i noticed they are using sigma-matcher for contextual adaptation. The boosting graph contains just the text snippets to be boosted and a sigma arc to match everything else.

I am not sure yet, how difficult would it be to add support of sigma-composition to kaldi for composing a lattice with FST having a sigma-arc. But the 'wirings' might be analogical to the phi-composition that already is there... So, maybe one day... Let's see...

KarelVesely84 avatar Jun 23 '21 13:06 KarelVesely84

@vesis84 I am quite interested in this feature, and appreciate you making the PR. Regardless of whether it gets merged right now, I would be curious to see an full example including the "adaptation graph" FST. Even seeing a pre-prepared FST would be enlightening, if you don't want to release the code to generate them.

daanzu avatar Jun 24 '21 14:06 daanzu

This issue has been automatically marked as stale by a bot solely because it has not had recent activity. Please add any comment (simply 'ping' is enough) to prevent the issue from being closed for 60 more days if you believe it should be kept open.

stale[bot] avatar Aug 23 '21 20:08 stale[bot]

I'm going to see if I can find someone to address the issues in this PR. I think it is an important topic that we should pursue.

danpovey avatar Jan 30 '22 02:01 danpovey

Ahoj, @vesis84, why? This is a very useful feature. Was it merged in another PR?

kkm000 avatar May 09 '22 14:05 kkm000

Hi, i thought it was not suitable for integration and stale PR. Also, part of this PR was reworked and integrated into #4692 (the lattice boosting part).

The remainder is the HCLG boosting part. It worked for me in the paper with noisy input data: http://www.fit.vutbr.cz/research/groups/speech/publi/2021/kocour21_interspeech.pdf For HCLG boosting, only isolated words could be boosted. For more complicated graphs, the composition would be very slow.

I was doing the composition here: https://github.com/vesis84/kaldi/blob/atco2/src/nnet3bin/nnet3-latgen-faster-compose.cc#L239 For this, the HCLG graph is internally converted from ConstFst to VectorFst, so it becomes mutable.

The HCLG boosting can also be dangerous. If the boosting values are too big, the recognition output becomes very bad easily. Lattice boosting usually produces decent results even with high boosting values.

Are you sure, you want to continue with the HCLG boosting ? Or what should be the direction to continue ? I can work on that... Let me know what is the intention... (i can remove the lattice boosting part from the PR, create some examples...)

K.

KarelVesely84 avatar May 10 '22 15:05 KarelVesely84

I'll try to respond this week if @danpovey or @kkm won't have more specific ideas (sooner :) y.

On Tue, May 10, 2022 at 11:06 AM Karel Vesely @.***> wrote:

Reopened #4571 https://github.com/kaldi-asr/kaldi/pull/4571.

— Reply to this email directly, view it on GitHub https://github.com/kaldi-asr/kaldi/pull/4571#event-6582862883, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACUKYXZ22GTCJQ5AXDDJZL3VJJ3QBANCNFSM4653Y6WA . You are receiving this because you are subscribed to this thread.Message ID: @.***>

jtrmal avatar May 10 '22 15:05 jtrmal

I think that boosting individual words would be a great thing already! We used Kaldi decoder to automate customer service for multiple clients, including clothing manufacturers and road assistance. The road assistance application sometimes spit out stuff like "I need to change attire." A simple lexicon boosting would have resolved that.

More significant HCLG changes on the fly would be likely expensive.

kkm000 avatar Oct 11 '22 09:10 kkm000

yeah, I agree

On Tue, Oct 11, 2022 at 5:09 AM kkm000 @.***> wrote:

I think that boosting individual words would be a great thing already! We used Kaldi decoder to automate customer service for multiple clients, including clothing manufacturers and road assistance. The road assistance application sometimes spit out stuff like "I need to change attire." A simple lexicon boosting would have resolved that.

More significant HCLG changes on the fly would be likely expensive.

— Reply to this email directly, view it on GitHub https://github.com/kaldi-asr/kaldi/pull/4571#issuecomment-1274365976, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACUKYX3KKI2RCRQJ7Y4UE2TWCUVFJANCNFSM4653Y6WA . You are receiving this because you commented.Message ID: @.***>

jtrmal avatar Oct 12 '22 18:10 jtrmal

@vesis84 do you have the bandwidth to work on this? it would be a welcome feature

jtrmal avatar Oct 12 '22 19:10 jtrmal