HATN
HATN copied to clipboard
Test data in vocabulary preparation
Hi, Congratulations on your amazing work. I have a doubt in vocabulary preparation in line 47 in utils.py. Testing data is also used in vocabulary preparation. However, testing data should be completely unseen right?? Please correct me if I am wrong.
Thank you
Testing data should be used in vocabulary preparation. Otherwise, you cannot learn the semantics and information of any target-specific words.
If there exists a large amount of unlabeled data in the target domain, the vocabulary of target unlabeled data is enough to cover the testing data of the target domain. In this case, i think it's not necessary to use the dictionary from the testing data.