TextAttack
TextAttack copied to clipboard
Bug when attacking <SPLIT> in MNLI using CLARE
Describe the bug
In MNLI training example 255115, the CLARE attack generate an attack result like this, which means it attacks the split token.
you still there Fernando sorry uh<SPLIT>I thought I lost you for a minute. you still there Fernando sorry uh<SPLOTS IT>I thought I lost you for a minute.
The problem lies here: https://github.com/QData/TextAttack/blob/master/textattack/shared/attacked_text.py#L415,
Since "I" is in "<SPLIT>", so it misattack the SPLIT token. Replace this line with
# get word_start
if (input_word in AttackedText.SPLIT_TOKEN) and (AttackedText.SPLIT_TOKEN in original_text):
split_start = original_text.index(AttackedText.SPLIT_TOKEN)
split_end = split_start + len(AttackedText.SPLIT_TOKEN)
if original_text.index(input_word) < split_end and \
original_text.index(input_word) >= split_start:
# hit corner cases when input_word is in AttackedText.SPLIT_TOKEN
word_start = original_text.replace(AttackedText.SPLIT_TOKEN, "").index(input_word) + len(AttackedText.SPLIT_TOKEN)
else:
word_start = original_text.index(input_word)
else:
word_start = original_text.index(input_word)
fix the issue.
@JFChi thanks for the fix! Can you submit a pull request and tag me?
@JFChi , Can you please share link to a colab notebook where we can reproduce this issue ?