PyRIT icon indicating copy to clipboard operation
PyRIT copied to clipboard

Support H-CoT: Hijacking the Chain-of-Thought to Jailbreak Reasoning Models

Open LifeHackerBee opened this issue 7 months ago • 3 comments

Is your feature request related to a problem? Please describe.

I recently learned about the jailbreak method “H-CoT: Hijacking the Chain-of-Thought Safety Reasoning Mechanism to Jailbreak Large Reasoning Models”, a method which has been shown to successfully bypass safety filters in several large reasoning models including OpenAI o1/o3, DeepSeek-R1, and Gemini 2.0 Flash Thinking. It would be great to implement this feature.

Refer:

  1. https://github.com/dukeceicenter/jailbreak-reasoning-openai-o1o3-deepseek-r1
  2. https://maliciouseducator.org/

Describe the solution you'd like

Could you provide explicit support or integration for the H-CoT jailbreak method within your repository?

LifeHackerBee avatar Apr 25 '25 09:04 LifeHackerBee

@shellsniper this looks super interesting. Would you like to contribute it yourself?

romanlutz avatar Apr 25 '25 11:04 romanlutz

Hey, everyone. You may consider to use the code of this paper https://github.com/JACKPURCELL/AutoRAN-public

JACKPURCELL avatar Jun 16 '25 18:06 JACKPURCELL

Note: This is still up for grabs as we closed the PR #1103

romanlutz avatar Oct 21 '25 06:10 romanlutz