TextAttack icon indicating copy to clipboard operation
TextAttack copied to clipboard

Bug when attacking <SPLIT> in MNLI using CLARE

Open JFChi opened this issue 3 years ago • 3 comments

Describe the bug

In MNLI training example 255115, the CLARE attack generate an attack result like this, which means it attacks the split token.

you still there Fernando sorry uh<SPLIT>I thought I lost you for a minute. you still there Fernando sorry uh<SPLOTS IT>I thought I lost you for a minute.

JFChi avatar Mar 24 '22 03:03 JFChi

The problem lies here: https://github.com/QData/TextAttack/blob/master/textattack/shared/attacked_text.py#L415,

Since "I" is in "<SPLIT>", so it misattack the SPLIT token. Replace this line with

# get word_start
if (input_word in AttackedText.SPLIT_TOKEN) and (AttackedText.SPLIT_TOKEN in original_text):
    split_start = original_text.index(AttackedText.SPLIT_TOKEN)
    split_end = split_start + len(AttackedText.SPLIT_TOKEN)
    if original_text.index(input_word) < split_end and \
        original_text.index(input_word) >= split_start:
        # hit corner cases when input_word is in AttackedText.SPLIT_TOKEN
        word_start = original_text.replace(AttackedText.SPLIT_TOKEN, "").index(input_word) + len(AttackedText.SPLIT_TOKEN)
    else:
        word_start = original_text.index(input_word)
else:
    word_start = original_text.index(input_word)

fix the issue.

JFChi avatar Mar 24 '22 03:03 JFChi

@JFChi thanks for the fix! Can you submit a pull request and tag me?

jxmorris12 avatar May 25 '22 17:05 jxmorris12

@JFChi , Can you please share link to a colab notebook where we can reproduce this issue ?

VijayKalmath avatar Jun 22 '22 19:06 VijayKalmath