TextAttack icon indicating copy to clipboard operation
TextAttack copied to clipboard

IndexError when use WordSwapQWERTY to augment data

Open zzk0 opened this issue 2 years ago • 3 comments

Describe the bug

IndexError when use WordSwapQWERTY to augment data

To Reproduce

The following code would reproduce the issue.

from textattack.transformations.word_swaps import WordSwapQWERTY
from textattack.constraints.pre_transformation import RepeatModification, StopwordModification
from textattack.augmentation import Augmenter


transformation = WordSwapQWERTY()
constraints = [RepeatModification(), StopwordModification()]
augmenter = Augmenter(transformation = transformation, constraints = constraints)
sentence = "This movie is excellent. I found it very interesting. I thought the Wendigo legend was pretty cool. The acting was also great, as well as the costumes, production, photography, directing and script. <br /><br />A very happy family, on vacation gets stranded in the middle of nowhere after they hit a deer. A huntsman then appears and is very angry and outraged over the fact that one of the deer's antler's is broken. He then starts to stalk the family and weird things start to happen to them. <br /><br />See this movie. It's worth it. Kudos to the cast, crew and filmmakers. Two Thumbs Way Up!"
t = augmenter.augment(sentence)
print(t)

zzk0 avatar Jan 10 '23 09:01 zzk0

  • The error will happen when the random.choice chooses an empty list.
  • There is some character not exists in the _keyboard_adjacency list, like , . '
# code snippet from textattack/transformations/word_swaps/word_swap_qwerty.py
    def _get_replacement_words(self, word):
        if len(word) <= 1:
            return []

        candidate_words = []

        start_idx = 1 if self.skip_first_char else 0
        end_idx = len(word) - (1 + self.skip_last_char)

        if start_idx >= end_idx:
            return []

        if self.random_one:
            i = random.randrange(start_idx, end_idx + 1)
            candidate_word = (
                word[:i] + random.choice(self._get_adjacent(word[i])) + word[i + 1 :]
            )
            candidate_words.append(candidate_word)
        else:
            for i in range(start_idx, end_idx + 1):
                for swap_key in self._get_adjacent(word[i]):
                    candidate_word = word[:i] + swap_key + word[i + 1 :]
                    candidate_words.append(candidate_word)

        return candidate_words

zzk0 avatar Jan 10 '23 09:01 zzk0

Hi @zzk0 thanks for catching this! want to open a small pull request to fix the bug?

jxmorris12 avatar Feb 03 '23 20:02 jxmorris12

@jxmorris12 The PR https://github.com/QData/TextAttack/pull/714 may fix this.

Many thanks for TextAttack, it indeed helps me a lot.

zzk0 avatar Feb 17 '23 13:02 zzk0