PyRIT FEAT: Add TDC23-Red Teaming Dataset

Describe the solution you'd like

This dataset should be available within PyRIT: https://huggingface.co/datasets/walledai/TDC23-RedTeaming

Additional context

There are examples of how PyRIT interacts with other datasets here: https://github.com/search?q=repo%3AAzure%2FPyRIT%20%23%20The%20dataset%20sources%20can%20be%20found%20at%3A&type=code

[[Content Warning: Prompts are aimed at provoking the model, and may contain offensive content.]] Additional Disclaimer: Given the content of these prompts, keep in mind that you may want to check with your relevant legal department before trying them against LLMs.

Oct 02 '24 22:10 nina-msft

can you please add me to this issue?

Oct 04 '24 20:10 Lakshmiaddepalli

Almost worked on the idea and tried to test it but ran out of time so adding my pseudo code here:

def fetch_tdc23_redteaming_dataset() -> PromptDataset:
    """
    Fetch TDC23-RedTeaming examples and create a PromptDataset.

    Returns:
        PromptDataset: A PromptDataset containing the examples.

    """
    # Load the TDC23-RedTeaming dataset
    data = load_dataset("walledai/TDC23-RedTeaming", "default")

    prompts = [item["prompt"] for item in data["train"]]

    dataset = PromptDataset(
        name="PKU-SafeRLHF",
        description="""This is a Hugging Face dataset that labels a prompt and 2 responses categorizing their
        helpfulness or harmfulness. Only the 'prompt' column is extracted.""",
        source="https://huggingface.co/datasets/walledai/TDC23-RedTeaming",
        prompts=prompts,
    )

    return dataset

Oct 04 '24 22:10 Lakshmiaddepalli

Add these functions to init.py as well to make it available in pyrit datasets

from pyrit.datasets import fetch_tdc23_redteaming_dataset

Oct 04 '24 22:10 Lakshmiaddepalli

Then follow this notebook example to run this dataset and get it working https://github.com/Azure/PyRIT/blob/main/doc/code/orchestrators/pku_safe_rlhf_testing.ipynb

Oct 04 '24 22:10 Lakshmiaddepalli