FEAT Add MathPromptConverter to Transform Prompts into Mathematical Problems
Is your feature request related to a problem? Please describe.
Currently, PyRIT lacks a converter that can transform harmful natural language prompts into mathematically encoded prompts. Recent research introduced MathPrompt, a technique that encodes harmful prompts into mathematical problems using concepts from set theory, abstract algebra, and symbolic logic. This method has been shown to effectively bypass safety mechanisms in Large Language Models (LLMs), highlighting a vulnerability in existing AI safety measures.
Integrating MathPrompt as a converter in PyRIT would enhance our ability to test and evaluate LLMs' robustness against such sophisticated attacks, ultimately contributing to more secure and reliable AI systems.
Describe the solution you'd like
-
Implement a
MathPromptConverter- Create a new converter that transforms natural language prompts into mathematically encoded problems using the MathPrompt methodology.
- The converter should leverage elements of set theory, abstract algebra, and symbolic logic to represent the original prompt in mathematical terms.
- Ensure compatibility with PyRIT's existing framework so that the converter can be easily integrated into the prompt processing pipeline.
Additional context
- Reference Paper: Jailbreaking Large Language Models with Symbolic Mathematics
Hello @romanlutz,
I would like to take on this task and implement the MathPromptConverter :)
@KutalVolkan this looks promising! Go right ahead! Sorry for the delay in responding.