tf_chatbot_seq2seq_antilm icon indicating copy to clipboard operation
tf_chatbot_seq2seq_antilm copied to clipboard

Beam search decoding is correct?

Open SaeedNajafi opened this issue 7 years ago • 3 comments

Hello, I am just curious how you implemented beam search decoding. Did you use the output logits of the default Seq2Seq model? I am working around these dialogue papers and trying to implement the beam search on top of the tensorflow's Seq2Seq model. The problem is in file "seq2seq.py" which is the tensorflow's implementation for sequence to sequence model. In that file, we have the loop_function which finds the best word from the previous step and then feeds it into the next step.

def _extract_argmax_and_embed(embedding,
                              output_projection=None,
                              update_embedding=True):
  """Get a loop_function that extracts the previous symbol and embeds it.
  Args:
    embedding: embedding tensor for symbols.
    output_projection: None or a pair (W, B). If provided, each fed previous
      output will first be multiplied by W and added B.
    update_embedding: Boolean; if False, the gradients will not propagate
      through the embeddings.
  Returns:
    A loop function.
  """

  def loop_function(prev, _):
    if output_projection is not None:
      prev = nn_ops.xw_plus_b(prev, output_projection[0], output_projection[1])
    prev_symbol = math_ops.argmax(prev, 1)
    # Note that gradients will not propagate through the second parameter of
    # embedding_lookup.
    emb_prev = embedding_ops.embedding_lookup(embedding, prev_symbol)
    if not update_embedding:
      emb_prev = array_ops.stop_gradient(emb_prev)
    return emb_prev

  return loop_function

The issue with using output logits of the Seq2Seq model to implement beam search is that if you select top B words from the step t-1 (B is the beam size), any logits you select from the step t are conditioned by the best word from the step t-1. For example, let's say in step 1 of decoding you have selected top B words (W1, W2, ..., WB) and the best word (with the highest logit) is W_STAR. In step 2, we select another B words (M1, M2, ..., MB) and now we want to select the top B from B*B combinations. The problem is that the output logit for a word Mi, which is P(Mi), is actually P(Mi|W_STAR). So, how can we compute the score for the sequence W1 Mi? its probability will be P(W1) * P(Mi|W_STAR) which is not P(W1) * P(Mi|W1).

I think we have to change the architecture of the seq2seq model to enable beam search decoding, as discussed here: https://github.com/tensorflow/tensorflow/issues/654

SaeedNajafi avatar Apr 15 '17 19:04 SaeedNajafi

I would like to know if the beam search implementation here is correct or not. Thanks a lot.

PratsBhatt avatar Jul 27 '17 13:07 PratsBhatt

@SaeedNajafi Saee Well if we take the conditional probability of generating the next word on given content . It's not only the previous word that keep an account of . So best way to find the combination with highest score is multiply the each probability of the generated word and take the highest of them all .

shamanez avatar Jul 31 '17 19:07 shamanez

@SaeedNajafi

Can you please answer on this also . It will be a great help. Issue #27

shamanez avatar Jul 31 '17 19:07 shamanez