icefall icon indicating copy to clipboard operation
icefall copied to clipboard

[WIP] Add context graph building

Open ezerhouni opened this issue 3 years ago • 4 comments

This PR adds:

  1. Early and not tested implementation of https://wenet.org.cn/wenet/context.html

ezerhouni avatar Nov 02 '22 16:11 ezerhouni

@csukuangfj It is a very early version of the code (starting porting the cpp code to python). Sorry for the poor state of the code

ezerhouni avatar Nov 02 '22 16:11 ezerhouni

@csukuangfj I have tried the code and could not make any improvement for OOV/rare words. When debugging, the context graph looked different than : https://wenet.org.cn/wenet/context.html I tried my own implementation for generating the graph (haven't pushed it yet) and it does not seems to improve neither. If you have any insights of what I am missing, that would be very useful. Thank you

ezerhouni avatar Nov 14 '22 09:11 ezerhouni

What scale did you use on the logprobs in the biasing graphs? We found that the scores need to be quite large to make a difference.

danpovey avatar Nov 14 '22 12:11 danpovey

I tried various scales (-1000, 10, 1000) just to see some differences, but I think there is something inherently wrong in my implementation. FYI, now I am using the follow :

def get_contextual_fst(context_words, sp, symbol_table, incremental_score, context_score):
    contextual_fst = kaldifst.StdVectorFst()
    start_state = contextual_fst.add_state()
    assert start_state == 0

    contextual_fst.start = start_state
    for context_word in context_words:
        context_tokens = sp.encode(context_word, out_type=str)
        prev_state = start_state
        escape_score = 0.0
        
        score = 0.0
        for i, token in enumerate(context_tokens):
            word_id = symbol_table[token]
            
            next_state = contextual_fst.add_state()
            score = ((i + 1) * incremental_score + context_score)
            escape_score += score
            contextual_fst.add_arc(prev_state, 
                                kaldifst.StdArc(word_id, word_id, score, next_state))
                
            contextual_fst.add_arc(next_state, 
                                kaldifst.StdArc(0, 0, -escape_score, start_state))
            
            prev_state = next_state
        
        final_state = contextual_fst.add_state()
        score = (incremental_score + context_score)
        contextual_fst.add_arc(prev_state, 
                               kaldifst.StdArc(0, 0, score, final_state))
        contextual_fst.set_final(state=start_state, weight=1.0)
            
    contextual_fst = kaldifst.determinize(contextual_fst)
    return contextual_fst

ezerhouni avatar Nov 14 '22 13:11 ezerhouni

The function modified_graph_search_with_context(), was that copied-and-modified from something? It's hard for me to understand what is supposed to be happening, there are not many comments. Also, when it says words, does that really mean words, or does it just mean generic symbols?
Because normally our models will output symbols, like BPE symbols or characters. And what was your testing setup for this? Sorry if this is all a little late.

danpovey avatar Jan 15 '23 14:01 danpovey

The function modified_graph_search_with_context(), was that copied-and-modified from something? It's hard for me to understand what is supposed to be happening, there are not many comments. Also, when it says words, does that really mean words, or does it just mean generic symbols? Because normally our models will output symbols, like BPE symbols or characters. And what was your testing setup for this? Sorry if this is all a little late.

I am still working on it (I had to pause the implementation). I am trying to follow something as https://www.isca-speech.org/archive_v0/Interspeech_2019/pdfs/1209.pdf . The implementation is not working for the moment, and I will keep debugging it later (and adding more comments). It modified_beam_search_with_context is a modification from modified_beam_search to add the context graph. The idea being that at inference time, we would like to boost some words (rare words / out of vocabulary, i.e not seen in the training set). I tried to do so using the context graph from the above paper. I will try to add more comments and a cleaner implementation soon (and hopefully a functioning implementation)

ezerhouni avatar Jan 15 '23 18:01 ezerhouni

BTW, we have generally not got as much improvement as we would have hoped, from doing things of this general nature. But it does help a bit. @glynpu and @pkufool, I think, were working on an approach where we do the "fast beam search" which is really a kind of FSA decoding to generate a lattice, using a graph that boosts the probability of certain words. The symbols we care about in the graph, though, are the word-pieces, not the words themselves. Those would be the 'labels' in the graph, the olabrls would be the 'aux_labels'..

danpovey avatar Jan 16 '23 03:01 danpovey

As you mentioned, I think the general implementation is using lattices and rescoring with "special" G. According to https://arxiv.org/pdf/2203.15455.pdf , the context graph seems to be working relatively well. In your experimentation, did you try a "classical" G.fst or the context graph approach (i.e with failure arcs) ?

ezerhouni avatar Jan 16 '23 07:01 ezerhouni

@ezerhouni Do you have any results for this work? Does the code work now?

cuongducle avatar Apr 12 '23 04:04 cuongducle

@ezerhouni Do you have any results for this work? Does the code work now?

I have implemented a version very similar to this PR, will make a PR soon.

pkufool avatar Apr 12 '23 06:04 pkufool

@ezerhouni Do you have any results for this work? Does the code work now?

TBH, it is not working and we are testing another implementation. I will try to see if we can push it here

ezerhouni avatar Apr 17 '23 11:04 ezerhouni