HATN icon indicating copy to clipboard operation
HATN copied to clipboard

Test data in vocabulary preparation

Open avinashsai opened this issue 5 years ago • 2 comments

Hi, Congratulations on your amazing work. I have a doubt in vocabulary preparation in line 47 in utils.py. Testing data is also used in vocabulary preparation. However, testing data should be completely unseen right?? Please correct me if I am wrong.

Thank you

avinashsai avatar Dec 05 '19 17:12 avinashsai

Testing data should be used in vocabulary preparation. Otherwise, you cannot learn the semantics and information of any target-specific words.

hsqmlzno1 avatar Dec 05 '19 17:12 hsqmlzno1

If there exists a large amount of unlabeled data in the target domain, the vocabulary of target unlabeled data is enough to cover the testing data of the target domain. In this case, i think it's not necessary to use the dictionary from the testing data.

hsqmlzno1 avatar Dec 05 '19 23:12 hsqmlzno1