auto-attack icon indicating copy to clipboard operation
auto-attack copied to clipboard

Fixed evaluation of models with random defenses

Open Buntender opened this issue 1 year ago • 5 comments

Thank you for your outstanding contributions.

@LYMDLUT and I put forward this PR to improve the evaluation of models with random defenses.

We've noticed that AutoAttack's current strategy for selecting the final output (clean/APGD etc) based on one time evaluation, regardless of whether the target models implement random defenses or not. This overlooks the variability of outputs in models with random defenses.

Relying on a single evaluation to filter samples for subsequent attacks leads to inflated success rate and hinders the exploration of attack methods that could potentially yield superior outcomes.

To address this, we propose to perform multiple time evaluations for models with random defenses and chose the adversarial example with the highest robustness as final output.

Buntender avatar Jan 19 '24 20:01 Buntender

@fra31 Could you please review this pr?

LYMDLUT avatar Mar 10 '24 13:03 LYMDLUT

您好,您的邮件已收到!

ScarlettChan avatar Mar 10 '24 13:03 ScarlettChan