[DRAFT] Iterative Attack Strategy & Multi-Branch Attack
Description
Enables iterative, multi-turn attack strategies with minimal changes. Byproduct of attempt to create a multi-branch attack during the Hackathon.
The current execute_with_context_async and perform_async methods defined by Strategy and its subclasses assume an attack is performed with no exposable middle state between attack initiation and attack conclusion. Therefore, these methods return StrategyResultT, which makes it impossible for users creating interactive attacks where one step does not necessarily conclude the attack to extend the methods. Calling an execute... method will return a final result, and for a multibranch attack where users navigate a tree node by node to retry prompts, it is not possible to perform stepwise operations.
My proposed fix is the addition of a subclass to AttackResult, an IntermediateAttackResult. It subclasses AttackResult and holds a StrategyContext object. Subclasses of AttackStrategy can choose to return this instead of an AttackStrategyResult to pass state in between calls of execute_.... This design has a few properties:
- It does not break existing attacks, as it subclasses from AttackResult and does not change its parents. Adding an optional StrategyContext attribute to AttackStrategy would potentially change the implementation of all AttackStrategy's, but is an option.
- Determining whether an attack is finished requires just a type check. Once an IntermediateAttackResult contains the final attack state, context is set to None, and the caller extracts its fields to return an AttackStrategyResult.
- It allows for a single Strategy ABC with minimal changes so that we do not need multiple inheritance or subclassing of AttackStrategyInteractive or AttackStrategyAtomic.
- New attacks can recycle existing code from their parent classes with minimal changes. Their only responsibility is ensuring the user is able to, at some point, get an AttackStrategy result.
Tests and Documentation
None yet.