TextAttack Bug when attacking <SPLIT> in MNLI using CLARE

Describe the bug

In MNLI training example 255115, the CLARE attack generate an attack result like this, which means it attacks the split token.

you still there Fernando sorry uh<SPLIT>I thought I lost you for a minute. you still there Fernando sorry uh<SPLOTS IT>I thought I lost you for a minute.

Mar 24 '22 03:03 JFChi

The problem lies here: https://github.com/QData/TextAttack/blob/master/textattack/shared/attacked_text.py#L415,

Since "I" is in "<SPLIT>", so it misattack the SPLIT token. Replace this line with

# get word_start
if (input_word in AttackedText.SPLIT_TOKEN) and (AttackedText.SPLIT_TOKEN in original_text):
    split_start = original_text.index(AttackedText.SPLIT_TOKEN)
    split_end = split_start + len(AttackedText.SPLIT_TOKEN)
    if original_text.index(input_word) < split_end and \
        original_text.index(input_word) >= split_start:
        # hit corner cases when input_word is in AttackedText.SPLIT_TOKEN
        word_start = original_text.replace(AttackedText.SPLIT_TOKEN, "").index(input_word) + len(AttackedText.SPLIT_TOKEN)
    else:
        word_start = original_text.index(input_word)
else:
    word_start = original_text.index(input_word)

fix the issue.

Mar 24 '22 03:03 JFChi

@JFChi thanks for the fix! Can you submit a pull request and tag me?

May 25 '22 17:05 jxmorris12

@JFChi , Can you please share link to a colab notebook where we can reproduce this issue ?

Jun 22 '22 19:06 VijayKalmath