Unsupervised-Aspect-Extraction icon indicating copy to clipboard operation
Unsupervised-Aspect-Extraction copied to clipboard

What happened to unknown words?

Open zhangriqi opened this issue 6 years ago • 2 comments

Hello, I noticed there's in the dictionary and we only keep the most frequent words in the dictionary. But I don't really understand what happened to the new words (they are all 'unk' in the dictionary, is that right? )that's only in the test data but not in the training data set? Please tell me what I'm missing. Appreciate it.

zhangriqi avatar May 31 '19 10:05 zhangriqi

words not in the vocab will be mapped to a special token "<unk>".

ruidan avatar Jun 03 '19 05:06 ruidan

How is it mapped to embeddings? Is any place for them in embedding matrix? Also for padding.

Gwynny avatar Dec 12 '20 12:12 Gwynny