PyRIT FEAT Add NVIDIA AI Content Safety Dataset

Is your feature request related to a problem? Please describe.

Add this dataset into PyRIT - it is not currently apart of PyRIT yet. https://huggingface.co/datasets/nvidia/Aegis-AI-Content-Safety-Dataset-2.0 NVIDIA's model evaluation for content safety was accomplished using the content moderation benchmarks from this test set.

Describe the solution you'd like

The HuggingFace data set: https://huggingface.co/datasets/nvidia/Aegis-AI-Content-Safety-Dataset-2.0 The associated paper: https://openreview.net/pdf?id=0MvGCv35wi

Additional context

Similar to previous dataset contributions this should live in pyrit.datasets as a "fetch" function. Also, the harm_categories property should be set on each prompt. This huggingface dataset mentions each harm category under the "violated_categories" column. There are examples of how PyRIT interacts with other datasets here: https://github.com/search?q=repo%3AAzure%2FPyRIT%20%23%20The%20dataset%20sources%20can%20be%20found%20at%3A&type=code

[[Content Warning: Prompts are aimed at provoking the model, and may contain offensive content.]] Additional Disclaimer: Given the content of these prompts, keep in mind that you may want to check with your relevant legal department before trying them against LLMs.

Aug 07 '25 19:08 chenss3

Is it fair to say that only the unsafe ones are relevant?

Aug 08 '25 06:08 romanlutz

Is it fair to say that only the unsafe ones are relevant?

Agreed, only the unsafe ones are relevant :)

Aug 15 '25 16:08 chenss3

@romanlutz if no one else is working on this I'd like to take this up!

Nov 28 '25 06:11 riyosha