awd-lstm-lm
awd-lstm-lm copied to clipboard
Dictionary - handling OOV tokens
I was looking into the data.py and saw that the dictionary consists of all tokens in train, val, and test files. I'm wondering if adding unseen tokens in val/test files to the dictionary will affect the testing in any way? Thanks!
Agreed. It could be okay for the benchmark dataset but seems problematic in a real scenario.