PyRIT icon indicating copy to clipboard operation
PyRIT copied to clipboard

FEAT Add Anthropic/model-written-evals Dataset

Open nina-msft opened this issue 1 year ago • 9 comments

Name: Anthropic/model-written-evals

Link: https://huggingface.co/datasets/Anthropic/model-written-evals

Relevant columns: "question", "answer_matching_behavior"

Originally posted by @divyaamin9825 in #429


Describe the solution you'd like

This dataset should be available within PyRIT: https://huggingface.co/datasets/Anthropic/model-written-evals Also available here: https://github.com/anthropics/evals Associated paper: https://arxiv.org/abs/2212.09251

Additional context

There are examples of how PyRIT interacts with other datasets here: https://github.com/search?q=repo%3AAzure%2FPyRIT%20%23%20The%20dataset%20sources%20can%20be%20found%20at%3A&type=code

[[Content Warning: Prompts are aimed at provoking the model, and may contain offensive content.]] Additional Disclaimer: Given the content of these prompts, keep in mind that you may want to check with your relevant legal department before trying them against LLMs.

nina-msft avatar Oct 10 '24 17:10 nina-msft

Hi, @nina-msft. I'd love to contribute here. It would not be my first issue though if it is alright.

Tiger-Du avatar Oct 18 '24 02:10 Tiger-Du

Go ahead @Tiger-Du !

romanlutz avatar Oct 19 '24 05:10 romanlutz

Just a heads up that only a small subset of the dataset is downloadable via load_dataset through Hugging Face. So it will require a bit more work to fetch!

Tiger-Du avatar Jan 13 '25 02:01 Tiger-Du

Interesting! How does one get the rest? Or do you have to call it multiple times?

romanlutz avatar Jan 13 '25 15:01 romanlutz

@Tiger-Du are you still looking into this?

romanlutz avatar Mar 10 '25 06:03 romanlutz

Yes, but not very actively at the moment! It seems that the authors uploaded only a 3,000-row subset as a Hugging Face Dataset, so the full dataset should be downloaded from either of the repositories, https://huggingface.co/datasets/Anthropic/model-written-evals/tree/main or https://github.com/anthropics/evals!

Tiger-Du avatar Apr 10 '25 02:04 Tiger-Du

That's fine! I'll make this issue available to others for now but feel free to claim it again if you are interested.

romanlutz avatar Apr 10 '25 04:04 romanlutz

@romanlutz Can also work on this while I'm at it!

0xm00n avatar Nov 04 '25 00:11 0xm00n

Great! It may take some exploration first on where to best get the dataset from as explained above by @Tiger-Du

romanlutz avatar Nov 04 '25 14:11 romanlutz