PyRIT
PyRIT copied to clipboard
Jailbreak datasets from Menz et al. publication
Addition of specific health dataset containing disinformation topics with jailbreaks.
Description
- Jailbreak dataset from https://doi.org/10.1136/bmj-2023-078538
- Two techniques:
- Six prompts each targeting health topics
- Four demographics for systematic testing
- Based on our BMJ publication evaluating LLM safeguards
@microsoft-github-policy-service agree