TextAttack icon indicating copy to clipboard operation
TextAttack copied to clipboard

bert-attack doesn't work very well

Open marsssser opened this issue 2 years ago • 13 comments

Hello, author, when I attacked LSTM-IMDB with bert-attack, I would not continue the attack for a long time after the 11th data was attacked, that is, the 12th result of the attack could not be completed, I wonder if it needs a longer time to wait (more than 2 hour?). Or is there a problem with the code?

marsssser avatar Nov 30 '21 12:11 marsssser

@marsssser are you using GPU?

qiyanjun avatar Nov 30 '21 22:11 qiyanjun

what is the attack recipe name you used?

qiyanjun avatar Nov 30 '21 22:11 qiyanjun

I have the same problem. Attack recipe that I used is the following: !textattack attack --recipe bert-attack --num-examples 1000 --model bert-base-uncased-mr --dataset-from-huggingface rotten_tomatoes --dataset-split test

DavidHerel avatar Dec 01 '21 18:12 DavidHerel

I raised this issue several months ago in the slack channel, and the reason why BERT-Attack takes so long to run is due to this line https://github.com/QData/TextAttack/blob/776dfece2aab2c6e0b9015d04696528e1706246d/textattack/transformations/word_swaps/word_swap_masked_lm.py#L228 because:

  1. The default value of K for BERT-Attack is 48 (see in the code), and the number of sub-word combinations grows exponentially w.r.t. K.
  2. During the attack, if a word gets tokenized to 4 sub-words, the number of combinations is 48^4 = 5308416, which is huge!

As far as I remember, the implementation in this library is correct. I think there may be a fundamental flaw in the algorithm itself. However, I suggest four ways to mitigate the issue

  1. Set K to something small (K <= 8 is fine).
  2. If you need to keep K=48, you can modify the code to ignore tokens that have more than three or more sub-words.
  3. I also think the authors of this repo should raise a warning for users that want to use BERT-Attack as well.
  4. Another way that was used in TextDefender is to ignore sub-words completely and only swap single words.

dangne avatar Dec 02 '21 03:12 dangne

@dangne thank you for the excellent comment! We plan to have a faster version of Bert-Attack based on your comment. If you are interested in doing it, please help us PR !~

qiyanjun avatar Dec 02 '21 16:12 qiyanjun

what is the attack recipe name you used?

!textattack attack --recipe bert-attack --model bert-base-uncased-yelp --num-examples 30

marsssser avatar Dec 09 '21 02:12 marsssser

几个月前我在slack频道提出了这个问题,BERT-Attack运行时间这么长的原因就是这条线

https://github.com/QData/TextAttack/blob/776dfece2aab2c6e0b9015d04696528e1706246d/textattack/transformations/word_swaps/word_swap_masked_lm.py#L228

因为:

  1. KBERT-Attack的默认值为48(见代码),子词组合的数量呈指数级增长K
  2. 在攻击过程中,如果一个词被标记为 4 个子词,组合的数量是 48^4 = 5308416,这是巨大的!

据我所知,这个库中的实现是正确的。我认为算法本身可能存在根本性缺陷。但是,我建议通过三种方法来缓解这个问题

  1. 设置K为小(K <= 8很好)。
  2. 如果您需要保留K=48,您可以修改代码以忽略具有超过三个或更多子词的标记。
  3. 我还认为这个 repo 的作者也应该向想要使用 BERT-Attack 的用户发出警告。

thank you very much!

marsssser avatar Dec 09 '21 02:12 marsssser

Hi guys, I modified the max_candidates value from 48 to 7 in 'bert_attack_li_2020.py' in line 41 but the isssue remains. The output indicates that the max_candidates value is 48 still. Am I gonna looking at other .py files as well? image

UTSJiyaoLi avatar Apr 17 '22 14:04 UTSJiyaoLi

Thanks. Let me take a look

Sent from my iPhone

On Apr 17, 2022, at 10:00, NONNO @.***> wrote:

 Hi guys, I modified the max_candidates value from 48 to 7 in 'bert_attack_li_2020.py' in line 41 but the isssue remains. The output indicates that the max_candidates value is 48 still. Am I gonna looking at other .py files as well?

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you commented.

qiyanjun avatar Apr 17 '22 14:04 qiyanjun

Any update?

mexiQQ avatar Aug 29 '23 03:08 mexiQQ

Any update?

All that's required is a minor adjustment to set max_candidates=8 This should do the trick.

HuichiZhou avatar Dec 13 '23 12:12 HuichiZhou

Mind to submit a PR? Seems super minor.. so easy to mergeOn Dec 13, 2023, at 06:28, Andy @.***> wrote:

Any update?

All that's required is a minor adjustment to set max_candidates=8 This should do the trick.

—Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you commented.Message ID: @.***>

qiyanjun avatar Dec 13 '23 13:12 qiyanjun

Mind to submit a PR? Seems super minor.. so easy to mergeOn Dec 13, 2023, at 06:28, Andy @.> wrote: Any update? All that's required is a minor adjustment to set max_candidates=8 This should do the trick. —Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you commented.Message ID: @.>

I have submitted a PR titled 'Add files via upload' under PR number #765. Looking forward to your feedback and hoping it adds value to the project.

HuichiZhou avatar Dec 13 '23 13:12 HuichiZhou