awd-lstm-lm icon indicating copy to clipboard operation
awd-lstm-lm copied to clipboard

Dictionary - handling OOV tokens

Open chiphuyen opened this issue 6 years ago • 1 comments

I was looking into the data.py and saw that the dictionary consists of all tokens in train, val, and test files. I'm wondering if adding unseen tokens in val/test files to the dictionary will affect the testing in any way? Thanks!

chiphuyen avatar Jul 19 '18 19:07 chiphuyen

Agreed. It could be okay for the benchmark dataset but seems problematic in a real scenario.

gyuwankim avatar Aug 31 '18 09:08 gyuwankim