FEAT hack-a-prompt target
Is your feature request related to a problem? Please describe.
https://www.linkedin.com/posts/learn-prompting_announcing-hackaprompt-20-the-worlds-activity-7329645371670827008-b3qg
Would be cool to have a target for this so that we can use PyRIT for the challenges.
Describe the solution you'd like
This can work similar to our Crucible Target or Gandalf Target.
Describe alternatives you've considered, if relevant
N/A
Additional context
N/A
Hey Roman,
Just a quick heads-up. I’ve reached out to Sander about the HackAPrompt challenge setup and asked about the request/response format and available targets.
I’ll take care of this and keep you posted once I hear back or make progress.
Update: Never mind! I see the challenges have already started, and I should be able to find that out myself easily.
Yes, the easiest way is probably to just intercept messages and reverse engineer 🙂 I'll assign it to you.
Hello Roman,
I wanted to share my findings after reverse engineering the HackAPrompt platform.
End-to-End Workflow
-
Login & Session Management
- HackAPrompt requires authentication via browser login (typically Google OAuth).
- All API requests (e.g.,
/api/chat) require session cookies, which can be copied from the browser or extracted automatically via Playwright. These expire, so keeping them fresh is necessary for automation.
-
Challenge Mapping
-
Each challenge has a descriptive title, detailed instructions, and a unique
challenge_slug(e.g. "Bacterial Basics" maps tobasic_challenge_cbrneunder thecbrnecompetition). -
For robust automation, we should map/display all three: challenge title, description, and slug.
-
Example Enum for mapping:
class HackAPromptChallenge(enum.Enum): BACTERIAL_BASICS = ( "basic_challenge_cbrne", "Bacterial Basics", "Your task is to convince this LLM to give you detailed instructions on how to obtain, process, and culture the bacteria that causes anthrax..." ) # More challenges here
-
-
Prompt Submission
- Prompts are submitted to
/api/chatwith a payload containing the session ID, challenge slug, competition slug, and messages (including prompt content). - Required cookies must be sent in the HTTP headers.
- Prompts are submitted to
-
Model Response Parsing
- The model’s answer is streamed back as a series of lines, each beginning with
0:and a quoted string. All0:lines are concatenated (after stripping quotes) to reconstruct the full response.
- The model’s answer is streamed back as a series of lines, each beginning with
-
Automated Judging
- To judge a submission, POST to
/api/challenges/{challenge_slug}/checkwith the session ID and competition slug. - The JSON response includes a
judgePanelarray with pass/fail results and explanations from multiple judges (e.g., "Judge Dreadful", "Grim Verdict", "Objection Jackson"). - This enables automated evaluation of attack success/failure.
- To judge a submission, POST to
-
Tooling & Rate Limits
- The platform enforces rate limits, so the automation should handle these gracefully.
Next Steps
I'm planning to start development of the hack-a-prompt target over the weekend of June 7-8 at the latest. My plan is to implement:
- Cookie/session management (manual and/or automated)
- Full challenge/slug mapping
- Submission and judge-check workflow
- Model and judge response parsing
However, I do have a few other deadlines coming up that may take priority. If this feature is particularly urgent, please feel free to jump in or get started, no need to wait for me!
Happy to review or collaborate as needed. :)
It's not urgent, and I like your plan!