FEAT Add Anthropic/model-written-evals Dataset
Name: Anthropic/model-written-evals
Link: https://huggingface.co/datasets/Anthropic/model-written-evals
Relevant columns: "question", "answer_matching_behavior"
Originally posted by @divyaamin9825 in #429
Describe the solution you'd like
This dataset should be available within PyRIT: https://huggingface.co/datasets/Anthropic/model-written-evals Also available here: https://github.com/anthropics/evals Associated paper: https://arxiv.org/abs/2212.09251
Additional context
There are examples of how PyRIT interacts with other datasets here: https://github.com/search?q=repo%3AAzure%2FPyRIT%20%23%20The%20dataset%20sources%20can%20be%20found%20at%3A&type=code
[[Content Warning: Prompts are aimed at provoking the model, and may contain offensive content.]] Additional Disclaimer: Given the content of these prompts, keep in mind that you may want to check with your relevant legal department before trying them against LLMs.
Hi, @nina-msft. I'd love to contribute here. It would not be my first issue though if it is alright.
Go ahead @Tiger-Du !
Just a heads up that only a small subset of the dataset is downloadable via load_dataset through Hugging Face. So it will require a bit more work to fetch!
Interesting! How does one get the rest? Or do you have to call it multiple times?
@Tiger-Du are you still looking into this?
Yes, but not very actively at the moment! It seems that the authors uploaded only a 3,000-row subset as a Hugging Face Dataset, so the full dataset should be downloaded from either of the repositories, https://huggingface.co/datasets/Anthropic/model-written-evals/tree/main or https://github.com/anthropics/evals!
That's fine! I'll make this issue available to others for now but feel free to claim it again if you are interested.
@romanlutz Can also work on this while I'm at it!
Great! It may take some exploration first on where to best get the dataset from as explained above by @Tiger-Du