ctc_beamsearch icon indicating copy to clipboard operation
ctc_beamsearch copied to clipboard

Trying to understand some parameters

Open pabloapast opened this issue 7 years ago • 0 comments

Hi there! I was trying to understand the beam search algorithm reading these papers: First-Pass Large Vocabulary Continuous Speech Recognition using Bi-Directional Recurrent DNNs and Lexicon-Free Conversational Speech Recognition with Neural Networks. I've implemented my algorithm following the one described in First-Pass Large Vocabulary Continuous Speech Recognition using Bi-Directional Recurrent DNNs, but it fails because I have some probabilities missing and I think that are the ones that you are setting as "-1e10". Here's my code:

class BeamSearch(object):
    """
    Decoder for audio to text.

    From: https://arxiv.org/pdf/1408.2873.pdf (hardcoded)
    """
    def __init__(self, alphabet='" abcdefghijklmnopqrstuvwxyz'):
        # blank symbol plus alphabet
        self.alphabet = '-' + alphabet
        # index of each char
        self.char_to_index = {c: i for i, c in enumerate(self.alphabet)}

    def decode(self, probs, k=100):
        """
        Decoder.

        :param probs: matrix of size Windows X AlphaLength
        :param k: beam size
        :returns: most probable prefix in A_prev
        """
        # List of prefixs, initialized with empty char
        A_prev = ['']
        # Probability of a prefix at windows time t to ending in blank
        p_b = {('', 0): 1.0}
        # Probability of a prefix at windows time t to not ending in blank
        p_nb = {('', 0): 0.0}

        # for each time window t
        for t in range(1, probs.shape[0] + 1):
            A_new = []
            # for each prefix
            for s in A_prev:
                for c in self.alphabet:
                    if c == '-':
                        p_b[(s, t)] = probs[t-1][self.char_to_index[self.blank]] *\
                                        (p_b[(s, t-1)] +\
                                            p_nb[(s, t-1)])
                        A_new.append(s)
                    else:
                        s_new = s + c
                        # repeated chars
                        if len(s) > 0 and c == s[-1]:
                            p_nb[(s_new, t)] = probs[t-1][self.char_to_index[c]] *\
                                                p_b[(s, t-1)]
                            p_nb[(s, t)] = probs[t-1][self.char_to_index[c]] *\
                                                p_b[(s, t-1)]
                        # spaces
                        elif c == ' ':
                            p_nb[(s_new, t)] = probs[t-1][self.char_to_index[c]] *\
                                               (p_b[(s, t-1)] +\
                                                p_nb[(s, t-1)])
                        else:
                            p_nb[(s_new, t)] = probs[t-1][self.char_to_index[c]] *\
                                                (p_b[(s, t-1)] +\
                                                    p_nb[(s, t-1)])
                            p_nb[(s, t)] = probs[t-1][self.char_to_index[c]] *\
                                                (p_b[(s, t-1)] +\
                                                    p_nb[(s, t-1)])
                        if s_new not in A_prev:
                            p_b[(s_new, t)] = probs[t-1][self.char_to_index[self.blank]] *\
                                                (p_b[(s, t-1)] +\
                                                    p_nb[(s, t-1)])
                            p_nb[(s_new, t)]  = probs[t-1][self.char_to_index[c]] *\
                                                    p_nb[(s, t-1)]
                        A_new.append(s_new)
            
            s_probs = map(lambda x: (x, (p_b[(x, t)] + p_nb[(x, t)])*len(x)), A_new)
            xs = sorted(s_probs, key=lambda x: x[1], reverse=True)[:k]
            A_prev, best_probs = zip(*xs)
        return A_prev[0], best_probs[0]

Thanks in advance! Pablo.

pabloapast avatar Mar 14 '17 18:03 pabloapast