TextFooler icon indicating copy to clipboard operation
TextFooler copied to clipboard

The usage of '<oov>' is not consistent with the paper

Open plasmashen opened this issue 4 years ago • 3 comments

In paper, the importance score of the word is calculated by removing this word, but you use '<oov>' to replace this word to calculate the importance score in https://github.com/jind11/TextFooler/blob/master/attack_classification.py#L216

Moreover, the '<oov>' will be tokenized into 4 tokens which may have attention affects with other words. I'm wondering why such nonsensical '<oov>' is used?

plasmashen avatar Jan 04 '21 09:01 plasmashen

hi, I have tested both methods: removing the word or replacing it with "" and the difference is not obvious. is in the vocab so I don't think it can be tokenized into 4 tokens. Let me know if you have more questions.

jind11 avatar Jan 07 '21 00:01 jind11

Where is the emdding.npz file, please? Or how is it generated? 7a678cd5f2a8398b7980d8aaa9d5aec b9069123768ea397299dc7ed1419901

Youoo1 avatar Oct 20 '21 14:10 Youoo1

The readme file has explained how to obtain the embeddings: Run the following code to pre-compute the cosine similarity scores between word pairs based on the counter-fitting word embeddings [https://drive.google.com/file/d/1bayGomljWb6HeYDMTDKXrh0HackKtSlx/view].

python comp_cos_sim_mat.py [PATH_TO_COUNTER_FITTING_WORD_EMBEDDINGS]

jind11 avatar Oct 21 '21 06:10 jind11