debug_seq2seq meaningful result?

Hi nicolas, first really thanks for your work. when I run your code, I cannot get meaningful results, all I got is like

NFO:lib.nn_model.train:[why ?] -> [i ' . . $$$ . $$$ $$$ $$$ $$$ as as as as i i]
INFO:lib.nn_model.train:[who ?] -> [i ' . . $$$ . $$$ $$$ $$$ $$$ as as as as i i]
INFO:lib.nn_model.train:[yeah ?] -> [i ' . . $$$ . $$$ $$$ $$$ $$$ as as as as i i]
INFO:lib.nn_model.train:[what is it ?] -> [i ' . . $$$ . $$$ $$$ $$$ $$$ as as as as as i]
INFO:lib.nn_model.train:[why not ?] -> [i ' . . $$$ . $$$ $$$ $$$ $$$ as as as as i i]
INFO:lib.nn_model.train:[really ?] -> [i ' . . $$$ . $$$ $$$ $$$ $$$ as as as as i i]
INFO:lib.nn_model.train:[huh ?] -> [i ' . . $$$ . $$$ $$$ $$$ $$$ as as as as i i]
INFO:lib.nn_model.train:[yes ?] -> [i ' . . $$$ . $$$ $$$ $$$ $$$ as as as as i i]
INFO:lib.nn_model.train:[what ' s that ?] -> [i ' . . $$$ . $$$ $$$ $$$ $$$ as as as as as i]
INFO:lib.nn_model.train:[what are you doing ?] -> [i ' . . $$$ . $$$ $$$ $$$ $$$ as as as as as i]
INFO:lib.nn_model.train:[what are you talking about ?] -> [i ' . . $$$ . $$$ $$$ $$$ $$$ as as as as as i]
INFO:lib.nn_model.train:[what happened ?] -> [i ' . . $$$ . $$$ $$$ $$$ $$$ as as as as as i]
INFO:lib.nn_model.train:[hello ?] -> [i ' . . $$$ . $$$ $$$ $$$ $$$ as as as as i i]
INFO:lib.nn_model.train:[where ?] -> [i ' . . $$$ . $$$ $$$ $$$ $$$ as as as as i i]
INFO:lib.nn_model.train:[how ?] -> [i ' . . $$$ . $$$ $$$ $$$ $$$ as as as as i i]
INFO:lib.nn_model.train:[excuse me ?] -> [i ' . . $$$ . $$$ $$$ $$$ $$$ as as as as i i]
INFO:lib.nn_model.train:[who are you ?] -> [i ' . . $$$ . $$$ $$$ $$$ $$$ as as as as as i]
INFO:lib.nn_model.train:[what do you want ?] -> [i ' . . $$$ . $$$ $$$ $$$ $$$ as as as as as i]
INFO:lib.nn_model.train:[what ' s wrong ?] -> [i ' . . $$$ .

or

NFO:lib.nn_model.train:[what are you talking about ?] -> [i ' . . . . . . . . , , , , , ,]
INFO:lib.nn_model.train:[what happened ?] -> [i ' . . . . . . . . , , , , , ,]
INFO:lib.nn_model.train:[hello ?] -> [i ' . . . . . . . . , , , , , ,]
INFO:lib.nn_model.train:[where ?] -> [i ' . . . . . . . . , , , , , ,]
INFO:lib.nn_model.train:[how ?] -> [i ' . . . . . . . . , , , , , ,]
INFO:lib.nn_model.train:[excuse me ?] -> [i ' . . . . . . . . , , , , , ,]
INFO:lib.nn_model.train:[who are you ?] -> [i ' . . . . . . . . , , , , , ,]

could you sharing your opinion with me? really appreciate

Jan 20 '16 08:01 liuchenxjtu

@liuchenxjtu how many iterations have you finished when you trained the model?

Jan 20 '16 12:01 nextdawn

Thanks for your reply. About 20. It is very slow in my machine. How many you suggest? Do u have some sample results for different iterations?

On 2016/01/20, at 21:53, nextdawn [email protected] wrote:

@liuchenxjtu how many iterations have you finished when you trained the model?

― Reply to this email directly or view it on GitHub.

Jan 20 '16 13:01 liuchenxjtu

Guys, I got the similar lame results yesterday... My guess is that there are some foundational problems in this approach:

Since word2vec vectors are used for words representations and the model returns an approximate vector for every next word, this error is accumulated from one word to another and thus starting from the third word the model fails to predict anything meaningful... This problem might be overcome if we replace our approximate word2vec vector every thimestamp with a "correct" vector, i.e. the one that corresponds to an actual word from the dictionary. Does it make sence? However you need to dig into seq2seq code to do that. @farizrahman4u could be quite helpful here.
The second problem relates to word sampling: even if you manage to solve the aforementioned issue, in case you stick to using argmax() for picking the most probable word every time stamps, the answers gonna be too simple and not interesting, like:

are you a human?            -- no .
are you a robot or human?   -- no .
are you a robot?            -- no .
are you better than siri?       -- yes .
are you here ?              -- yes .
are you human?          -- no .
are you really better than siri?    -- yes .
are you there               -- you ' re not going to be
are you there?!?!           -- yes .

Not to mislead you: these results were achieved on a different seq2seq architecture, based on tensorflow.

Sampling with temperature could be used in order to diversify the output results, however that's again should be done inside seq2seq library.

Jan 20 '16 13:01 nicolas-ivanov

@nicolas-ivanov Did you try the other models? Seq2seq, Seq2seq with peek, Attention Seq2seq etc?

Jan 20 '16 13:01 farizrahman4u

I recently tested attention seq2seq on the babi dataset and it worked (100% val acc).

Jan 20 '16 13:01 farizrahman4u

@farizrahman4u not yet, I'll set the experiment with Attention Seq2seq now. Meanwhile could you please post the link here to your dataset? And some results example.

Jan 20 '16 13:01 nicolas-ivanov

The standard babi dataset from facebook (used by keras in examples). I did it using a slightly different layer but the idea is almost as same as attention seq2seq. I will be posting the code in a few days as I have not tested on all the babi tasks yet.

Jan 20 '16 13:01 farizrahman4u

Hello @farizrahman4u , I tried using attention seq2seq model, but got ShapeMismatch error. This error doesn't occur while using SimpleSeq2Seq model. Is there anything that I missing?

Feb 18 '16 20:02 liveabstract

Please post your code.

Feb 19 '16 04:02 farizrahman4u

@farizrahman4u : Following code is from model.py file, i haven't changed much apart from the model name :

import os.path

from keras.models import Sequential
from seq2seq.models import AttentionSeq2seq
from seq2seq.models import SimpleSeq2seq
from seq2seq.models import Seq2seq

from configs.config import TOKEN_REPRESENTATION_SIZE, HIDDEN_LAYER_DIMENSION, SAMPLES_BATCH_SIZE, \
    INPUT_SEQUENCE_LENGTH, ANSWER_MAX_TOKEN_LENGTH, NN_MODEL_PATH
from utils.utils import get_logger

_logger = get_logger(__name__)


def get_nn_model(token_dict_size):
    _logger.info('Initializing NN model with the following params:')
    _logger.info('Input dimension: %s (token vector size)' % TOKEN_REPRESENTATION_SIZE)
    _logger.info('Hidden dimension: %s' % HIDDEN_LAYER_DIMENSION)
    _logger.info('Output dimension: %s (token dict size)' % token_dict_size)
    _logger.info('Input seq length: %s ' % INPUT_SEQUENCE_LENGTH)
    _logger.info('Output seq length: %s ' % ANSWER_MAX_TOKEN_LENGTH)
    _logger.info('Batch size: %s' % SAMPLES_BATCH_SIZE)

    model = Sequential()
    seq2seq = SimpleSeq2seq(
        input_dim=TOKEN_REPRESENTATION_SIZE,
        input_length=INPUT_SEQUENCE_LENGTH,
        hidden_dim=HIDDEN_LAYER_DIMENSION,
        output_dim=token_dict_size,
        output_length=ANSWER_MAX_TOKEN_LENGTH,
        depth=3
    )

    model.add(seq2seq)
    model.compile(loss='mse', optimizer='rmsprop')

    model.save_weights(NN_MODEL_PATH)

    # use previously saved model if it exists
    _logger.info('Looking for a model %s' % NN_MODEL_PATH)

    if os.path.isfile(NN_MODEL_PATH):
        _logger.info('Loading previously calculated weights...')
        model.load_weights(NN_MODEL_PATH)

    _logger.info('Model is built')
    return model

Feb 19 '16 06:02 liveabstract

Hi @nicolas-ivanov, you mentioned that 'the bad results were based on tensorflow', what are the datasets and other settings? what is the inital perplexity and the converge perplexity on both training set and validation set? I am trying to adapt the translation model example from tensorflow to train a chatbot, is it possible for you to give some details on these? Thanks.

Apr 01 '16 07:04 tilneyyang

It was maybe due to the lack of learning iteration or lack of data size.

May 06 '16 11:05 changukshin

Hi, Is there anyone who could successfully run this project?

Firstly, when I run this project, I met the log below:

Epoch 1/1 32/32 [==============================] - 0s - loss: nan Epoch 1/1 32/32 [==============================] - 0s - loss: nan Epoch 1/1 32/32 [==============================] - 0s - loss: nan Epoch 1/1 32/32 [==============================] - 0s - loss: nan Epoch 1/1 32/32 [==============================] - 0s - loss: nanINFO:lib.nn_model.train:[Hi!] -> [raining raining raining raining raining raining] INFO:lib.nn_model.train:[Hi] -> [raining raining raining raining raining raining] INFO:lib.nn_model.train:[what ?] -> [raining raining raining raining raining raining] INFO:lib.nn_model.train:[why ?] -> [raining raining raining raining raining raining] INFO:lib.nn_model.train:[who ?] -> [raining raining raining raining raining raining] INFO:lib.nn_model.train:[yeah ?] -> [raining raining raining raining raining raining] INFO:lib.nn_model.train:[what is it ?] -> [raining raining raining raining raining raining] INFO:lib.nn_model.train:[why not ?] -> [raining raining raining raining raining raining] INFO:lib.nn_model.train:[really ?] -> [raining raining raining raining raining raining] INFO:lib.nn_model.train:[huh ?] -> [raining raining raining raining raining raining] INFO:lib.nn_model.train:[yes ?] -> [raining raining raining raining raining raining] INFO:lib.nn_model.train:[what ' s that ?] -> [raining raining raining raining raining raining] INFO:lib.nn_model.train:[what are you doing ?] -> [raining raining raining raining raining raining] INFO:lib.nn_model.train:[what are you talking about ?] -> [raining raining raining raining raining raining] INFO:lib.nn_model.train:[what happened ?] -> [raining raining raining raining raining raining] INFO:lib.nn_model.train:[hello ?] -> [raining raining raining raining raining raining] INFO:lib.nn_model.train:[where ?] -> [raining raining raining raining raining raining] INFO:lib.nn_model.train:[how ?] -> [raining raining raining raining raining raining] INFO:lib.nn_model.train:[excuse me ?] -> [raining raining raining raining raining raining] INFO:lib.nn_model.train:[who are you ?] -> [raining raining raining raining raining raining] INFO:lib.nn_model.train:[what do you want ?] -> [raining raining raining raining raining raining] INFO:lib.nn_model.train:[what ' s wrong ?] -> [raining raining raining raining raining raining] INFO:lib.nn_model.train:[so ?] -> [raining raining raining raining raining raining]

Secondly, I change the model code from SimpleSeq2seq to AttentionSeq2seq.
Found a little difference, it print time now. But still wrong. Epoch 1/1 32/32 [==============================] - 3s - loss: nan Epoch 1/1 32/32 [==============================] - 3s - loss: nan Epoch 1/1 32/32 [==============================] - 3s - loss: nan Epoch 1/1 32/32 [==============================] - 3s - loss: nan Epoch 1/1 32/32 [==============================] - 3s - loss: nanINFO:lib.nn_model.train:[Hi!] -> [raining raining raining raining raining raining] INFO:lib.nn_model.train:[Hi] -> [raining raining raining raining raining raining] INFO:lib.nn_model.train:[what ?] -> [raining raining raining raining raining raining] INFO:lib.nn_model.train:[why ?] -> [raining raining raining raining raining raining] INFO:lib.nn_model.train:[who ?] -> [raining raining raining raining raining raining] INFO:lib.nn_model.train:[yeah ?] -> [raining raining raining raining raining raining] INFO:lib.nn_model.train:[what is it ?] -> [raining raining raining raining raining raining] INFO:lib.nn_model.train:[why not ?] -> [raining raining raining raining raining raining] INFO:lib.nn_model.train:[really ?] -> [raining raining raining raining raining raining] INFO:lib.nn_model.train:[huh ?] -> [raining raining raining raining raining raining]

Thanks a lot.

Oct 10 '16 01:10 KevinYuk

@KevinYuk I got the same "raining" result! Do you have any insight?

Dec 19 '16 20:12 lijuncheng16

Just in my opinion, repeating same words means 'not yet fitted'. And it said 'loss: nan'. It means something is not good.(very high loss or ...) please re-consider to set your hyperparameters(could you give your hyperparameters?)

Dec 20 '16 03:12 changukshin

debug_seq2seq debug_seq2seq copied to clipboard

meaningful result?

debug_seq2seq
debug_seq2seq copied to clipboard