TextAttack
TextAttack copied to clipboard
IndexError when use WordSwapQWERTY to augment data
Describe the bug
IndexError when use WordSwapQWERTY to augment data
To Reproduce
The following code would reproduce the issue.
from textattack.transformations.word_swaps import WordSwapQWERTY
from textattack.constraints.pre_transformation import RepeatModification, StopwordModification
from textattack.augmentation import Augmenter
transformation = WordSwapQWERTY()
constraints = [RepeatModification(), StopwordModification()]
augmenter = Augmenter(transformation = transformation, constraints = constraints)
sentence = "This movie is excellent. I found it very interesting. I thought the Wendigo legend was pretty cool. The acting was also great, as well as the costumes, production, photography, directing and script. <br /><br />A very happy family, on vacation gets stranded in the middle of nowhere after they hit a deer. A huntsman then appears and is very angry and outraged over the fact that one of the deer's antler's is broken. He then starts to stalk the family and weird things start to happen to them. <br /><br />See this movie. It's worth it. Kudos to the cast, crew and filmmakers. Two Thumbs Way Up!"
t = augmenter.augment(sentence)
print(t)
- The error will happen when the
random.choice
chooses an empty list. - There is some character not exists in the
_keyboard_adjacency
list, like, . '
# code snippet from textattack/transformations/word_swaps/word_swap_qwerty.py
def _get_replacement_words(self, word):
if len(word) <= 1:
return []
candidate_words = []
start_idx = 1 if self.skip_first_char else 0
end_idx = len(word) - (1 + self.skip_last_char)
if start_idx >= end_idx:
return []
if self.random_one:
i = random.randrange(start_idx, end_idx + 1)
candidate_word = (
word[:i] + random.choice(self._get_adjacent(word[i])) + word[i + 1 :]
)
candidate_words.append(candidate_word)
else:
for i in range(start_idx, end_idx + 1):
for swap_key in self._get_adjacent(word[i]):
candidate_word = word[:i] + swap_key + word[i + 1 :]
candidate_words.append(candidate_word)
return candidate_words
Hi @zzk0 thanks for catching this! want to open a small pull request to fix the bug?
@jxmorris12 The PR https://github.com/QData/TextAttack/pull/714 may fix this.
Many thanks for TextAttack, it indeed helps me a lot.