PyRIT icon indicating copy to clipboard operation
PyRIT copied to clipboard

FEAT Add NVIDIA AI Content Safety Dataset

Open chenss3 opened this issue 4 months ago • 2 comments

Is your feature request related to a problem? Please describe.

Add this dataset into PyRIT - it is not currently apart of PyRIT yet. https://huggingface.co/datasets/nvidia/Aegis-AI-Content-Safety-Dataset-2.0 NVIDIA's model evaluation for content safety was accomplished using the content moderation benchmarks from this test set.

Describe the solution you'd like

The HuggingFace data set: https://huggingface.co/datasets/nvidia/Aegis-AI-Content-Safety-Dataset-2.0 The associated paper: https://openreview.net/pdf?id=0MvGCv35wi

Additional context

Similar to previous dataset contributions this should live in pyrit.datasets as a "fetch" function. Also, the harm_categories property should be set on each prompt. This huggingface dataset mentions each harm category under the "violated_categories" column. There are examples of how PyRIT interacts with other datasets here: https://github.com/search?q=repo%3AAzure%2FPyRIT%20%23%20The%20dataset%20sources%20can%20be%20found%20at%3A&type=code

[[Content Warning: Prompts are aimed at provoking the model, and may contain offensive content.]] Additional Disclaimer: Given the content of these prompts, keep in mind that you may want to check with your relevant legal department before trying them against LLMs.

chenss3 avatar Aug 07 '25 19:08 chenss3

Is it fair to say that only the unsafe ones are relevant?

romanlutz avatar Aug 08 '25 06:08 romanlutz

Is it fair to say that only the unsafe ones are relevant?

Agreed, only the unsafe ones are relevant :)

chenss3 avatar Aug 15 '25 16:08 chenss3

@romanlutz if no one else is working on this I'd like to take this up!

riyosha avatar Nov 28 '25 06:11 riyosha