windbag handle incremental dictionary

handle incremental dictionary

Open tongda opened this issue 7 years ago • 1 comments

Currently, for each dataset, we have to generate a new dictionary, which maps words to id. But if there are new words coming in, we have to retrain the whole model.

I am considering create a dictionary with some placeholders which stands for future-known words. Since Cornell-Movie dataset has about 24k words, maybe creating a 100,000 words dictionary first makes sense for now.

There should be some issue with these future known words. In training dataset, these words are not seen, but possibly in the predicted answers may exist future-known words. I think we can mark these words as first.

Sep 18 '17 06:09 tongda

Found something helpful: tf.contrib.layers.scattered_embedding_lookup.

reference: http://arxiv.org/pdf/1504.04788.pdf

Oct 03 '17 07:10 tongda

windbag windbag copied to clipboard

handle incremental dictionary

windbag
windbag copied to clipboard