Support H-CoT: Hijacking the Chain-of-Thought to Jailbreak Reasoning Models
Is your feature request related to a problem? Please describe.
I recently learned about the jailbreak method “H-CoT: Hijacking the Chain-of-Thought Safety Reasoning Mechanism to Jailbreak Large Reasoning Models”, a method which has been shown to successfully bypass safety filters in several large reasoning models including OpenAI o1/o3, DeepSeek-R1, and Gemini 2.0 Flash Thinking. It would be great to implement this feature.
Refer:
- https://github.com/dukeceicenter/jailbreak-reasoning-openai-o1o3-deepseek-r1
- https://maliciouseducator.org/
Describe the solution you'd like
Could you provide explicit support or integration for the H-CoT jailbreak method within your repository?
@shellsniper this looks super interesting. Would you like to contribute it yourself?
Hey, everyone. You may consider to use the code of this paper https://github.com/JACKPURCELL/AutoRAN-public
Note: This is still up for grabs as we closed the PR #1103