PyRIT icon indicating copy to clipboard operation
PyRIT copied to clipboard

FEAT: Malicious Question Generator

Open KutalVolkan opened this issue 1 year ago • 7 comments

Overview

This PR introduces the Malicious Question Generator converter, adapted from Project Moonshot. The converter uses an LLM (through the PromptTarget) to generate cybersecurity-related malicious questions in Python list format.

Work Completed

  • Implemented the malicious_question_generator_converter.py, which utilizes the LLM to generate edge-case adversarial questions.
  • Added a Jupyter notebook (malicious_question_generator_converter.ipynb) to document and demonstrate the converter’s functionality.
  • Developed an initial parsing approach to extract Python lists from LLM responses, with improvements needed (see Next Steps).

Next Steps (Feedback Requested)

I'm facing a challenge with:

Prompt Construction: The current prompt is adapted directly from Project Moonshot, where it's designed to generate malicious questions. I experimented with removing the phrase "Python list" in the prompt, but this led to more unpredictable output from the LLM, requiring additional parsing and cleanup. It seems that explicitly mentioning a "Python list" ensures the model returns output in a more structured and predictable format.

Parsing Enhancements: While the current approach to parsing works, dealing with extra formatting (like code block markers and commas) is still cumbersome. I've implemented cleaning methods to address this, but would welcome suggestions on more efficient ways to handle unexpected formatting and edge cases in LLM-generated content.

Related Issue

Contributes to #376, focused on adding attack modules from Project Moonshot.


Current Output (Screenshot):
Here’s an example of the current output format generated by the LLM based on the existing implementation. While the structure is mostly correct, additional enhancements could improve the clarity of the interaction. For example, it might be more effective to split each question into separate user inputs, and instead of receiving one response from the assistant for all questions, obtain individual responses for each question.

image

KutalVolkan avatar Sep 25 '24 08:09 KutalVolkan

Hello @romanlutz,

Summary of the Issue:

The current approach in the MaliciousQuestionGeneratorConverter generates all questions in one go and returns them concatenated into a single output_text. This causes the PromptSendingOrchestrator to treat all questions as one large prompt rather than handling each question individually.

This approach leads to two potential problems:

  1. Single Batch Handling: Since all questions are grouped together in one ConverterResult, the orchestrator sends them as a single prompt to the target LLM, rather than processing each question separately. This can reduce the flexibility of interacting with the target LLM.

  2. Independent Question Processing: In scenarios where each question needs to be processed independently—such as sending separate prompts or obtaining individual responses—this approach doesn't allow the orchestrator to manage each question as a distinct prompt.

Proposed Solutions:

  1. Modify convert_async to Return One Question at a Time: Instead of concatenating all questions, the converter can be updated to return one question per call. The orchestrator would then call the converter multiple times, each time handling a new question, thus maintaining the individual nature of each question.

  2. Split the Output in the Orchestrator: Another option is to modify the orchestrator to split the concatenated output_text from the converter into separate questions. The orchestrator could then treat each split question as a distinct prompt, sending them one by one to the target LLM.

Recommendation:

I believe the best solution is to modify the convert_async method in the MaliciousQuestionGeneratorConverter to return one question at a time. This approach allows the orchestrator to handle each question individually, maintaining flexibility without requiring changes to the orchestrator itself.

That said, I’m open to suggestions if there’s a better approach. If the current setup works well or you have other ideas based on your experience, I’m happy to adjust accordingly. Your decision will be valuable in guiding the best path forward.

KutalVolkan avatar Sep 25 '24 12:09 KutalVolkan

Parsing Enhancements: While the current approach to parsing works, dealing with extra formatting (like code block markers and commas) is still cumbersome. I've implemented cleaning methods to address this, but would welcome suggestions on more efficient ways to handle unexpected formatting and edge cases in LLM-generated content.

That's a tough one... We have places where we expect JSON which you have probably come across. We do retries if we can't parse.

There are some packages (eg guidance) that help but we want to work with any target regardless of whether they have that enabled or not.

romanlutz avatar Sep 27 '24 23:09 romanlutz

Hello @romanlutz,

Thank you for your feedback! The requested changes have been made. Please feel free to review them again, and let me know if further adjustments are needed—I'll be happy to make them. :)

Also, would you mind giving feedback if it makes sense to modify convert_async to return one question at a time? Or do you think it's not necessary for now?

KutalVolkan avatar Sep 29 '24 08:09 KutalVolkan

Ah, apologies, I missed some of your context. Must have been reviewing in the mobile app and missed the description and updates.

A bit of repo context here is that we used to have 1-to-n converters, as in converters that take one piece of text/image/video/audio and generate n outputs. For example, generate n rephrased prompts based on the input prompt. This is really useful when you want to generate stuff, but it is also very annoying when you write orchestrators that expect 1 output 🙂 So we restricted it to 1-to-1. We should follow the same here. Missed that earlier, my bad!

Of course, one can still call it multiple times if you're using it just to generate questions independently from orchestrators!

Thanks for your patience 🙂

romanlutz avatar Oct 03 '24 02:10 romanlutz

Ah, apologies, I missed some of your context. Must have been reviewing in the mobile app and missed the description and updates.

A bit of repo context here is that we used to have 1-to-n converters, as in converters that take one piece of text/image/video/audio and generate n outputs. For example, generate n rephrased prompts based on the input prompt. This is really useful when you want to generate stuff, but it is also very annoying when you write orchestrators that expect 1 output 🙂 So we restricted it to 1-to-1. We should follow the same here. Missed that earlier, my bad!

Of course, one can still call it multiple times if you're using it just to generate questions independently from orchestrators!

Thanks for your patience 🙂

Hello Roman,

Thank you for your reply :)

I’ve implemented a straightforward adjustment where the question generation now occurs one prompt at a time per iteration.

Would you like me to create tests for this change as well? Let me know if any further adjustments are needed.


Current Output (Screenshot)

image

KutalVolkan avatar Oct 03 '24 10:10 KutalVolkan

Hello @romanlutz,

This PR is ready for review. Feel free to provide feedback or suggest any enhancements. I’ll gladly make the necessary adjustments :)

KutalVolkan avatar Oct 08 '24 19:10 KutalVolkan

Hi @romanlutz,

After merging and running the pre-commit hooks, I encountered a mypy error. I didn’t want to modify code outside of my scope and potentially introduce issues in other parts of the project. Therefore, I decided to leave it as is but wanted to inform you about it.

Here’s a summary of the error:

pyrit\auxiliary_attacks\gcg\attack\base\attack_manager.py:1084: error: Module has no attribute "infty"  [attr-defined]

KutalVolkan avatar Oct 10 '24 09:10 KutalVolkan

Hello @romanlutz, Hello @rlundeen2 :)

Thank you for your feedback! While applying it, I encountered the following issue. The error I ran into was:

AttributeError: 'OpenAIChatTarget' object has no attribute '_deployment_name'

This occurred because the _deployment_name attribute was being referenced before it was properly initialized in the OpenAIChatTarget class. Specifically, the attribute was checked before being assigned during the non-Azure OpenAI initialization process. To resolve this, I moved the initialization of _deployment_name earlier in the constructor and added error handling to ensure that both the API key and deployment name are properly set, with clear error messages if they are missing or incorrect.

If there was an issue on my part from the beginning, I will revert the changes I made to the OpenAITarget.

KutalVolkan avatar Oct 17 '24 07:10 KutalVolkan

Hello @romanlutz, Hello @rlundeen2,

The changes are ready for review. Please don't hesitate to provide feedback—if you see any improvements needed, feel free to let me know, and I'll be happy to make the changes :)

Note: I need to ensure the comments are still relevant due to the recent code changes.

KutalVolkan avatar Oct 17 '24 09:10 KutalVolkan

Hello @rlundeen2 and @romanlutz,

I’ve reviewed the errors from the build_and_test: an incorrect attribute name AzureOpenAITextChatTarget, missing named arguments converter_target, prompt_template, and a few type-check suggestions. I’ll fix these first thing tomorrow morning.

KutalVolkan avatar Oct 17 '24 19:10 KutalVolkan

@KutalVolkan I made some changes;

  • I don't think we actually needed python deserialization since we were just serializing/deserializing
  • I refactored the LLMGenericTarget so we could just pass in the task. This meant changing the tests around that also
  • I moved the notebook to 2_llm_converters.ipynb since I think we can show it working with a single command :)

Thanks again!

rlundeen2 avatar Oct 17 '24 21:10 rlundeen2

@KutalVolkan I made some changes;

  • I don't think we actually needed python deserialization since we were just serializing/deserializing
  • I refactored the LLMGenericTarget so we could just pass in the task. This meant changing the tests around that also
  • I moved the notebook to 2_llm_converters.ipynb since I think we can show it working with a single command :)

Thanks again!

@rlundeen2 Awesome! Thank you for demonstrating best practices through your example; I really appreciate it :)

KutalVolkan avatar Oct 18 '24 04:10 KutalVolkan