FEAT Add SALT-NLP/mic Dataset
Name: SALT-NLP/mic
Link: https://www.dropbox.com/sh/m46z42nce8x0ttk/AABuSZiA6ESyrJNWmgTPrfuRa?dl=0
Relevant columns: "Q","A","rot","moral","rot-agree","A_agrees","violation-severity","worker_answer"
Note: rot is rule of thumb
Originally posted by @divyaamin9825 in #429
Describe the solution you'd like
This dataset should be available within PyRIT: https://huggingface.co/datasets/SALT-NLP/MIC Also available here: https://github.com/SALT-NLP/mic Associated paper: https://aclanthology.org/2022.acl-long.261/
Additional context
There are examples of how PyRIT interacts with other datasets here: https://github.com/search?q=repo%3AAzure%2FPyRIT%20%23%20The%20dataset%20sources%20can%20be%20found%20at%3A&type=code
[[Content Warning: Prompts are aimed at provoking the model, and may contain offensive content.]] Additional Disclaimer: Given the content of these prompts, keep in mind that you may want to check with your relevant legal department before trying them against LLMs.
Hello @nina-msft ! I would like to contribute to this issue. :)
Please go ahead @SidraEffendi ! Let us know if you need any help / pointers.
Unassigning this for now. If you'd like to pick it up again @SidraEffendi please reply here. Otherwise, we'll assume it's up for grabs.
Yes, I still want to work on it. I had problem with running workflows on my fork. I plan to delete my fork, as there is no re-run/ build option anymore and re- fork main branch to see if it works. Do you have any suggestions if I get stuck on this step?
Thanks, Sidra
On Mon, Mar 10, 2025, 2:14 AM Roman Lutz @.***> wrote:
Unassigning this for now. If you'd like to pick it up again @SidraEffendi https://github.com/SidraEffendi please reply here. Otherwise, we'll assume it's up for grabs.
— Reply to this email directly, view it on GitHub https://github.com/Azure/PyRIT/issues/448#issuecomment-2709538113, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABIP2LWHWTVHB4JTEC47W2D2TU3UVAVCNFSM6AAAAABPXH73O6VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDOMBZGUZTQMJRGM . You are receiving this because you were mentioned.Message ID: @.***> [image: romanlutz]romanlutz left a comment (Azure/PyRIT#448) https://github.com/Azure/PyRIT/issues/448#issuecomment-2709538113
Unassigning this for now. If you'd like to pick it up again @SidraEffendi https://github.com/SidraEffendi please reply here. Otherwise, we'll assume it's up for grabs.
— Reply to this email directly, view it on GitHub https://github.com/Azure/PyRIT/issues/448#issuecomment-2709538113, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABIP2LWHWTVHB4JTEC47W2D2TU3UVAVCNFSM6AAAAABPXH73O6VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDOMBZGUZTQMJRGM . You are receiving this because you were mentioned.Message ID: @.***>
Assigned it right back to you! Thanks for reaching out.
Forks are a little funny that way... There are some concerns around secrets, so anything that requires those won't work on forks. AFAIK the workflows are mainly enabled on the "main" repo (the Azure/PyRIT one) and when you open a pull request they run automatically. Of course, you may want to run them during development already! For that case, we usually open the pull request in "draft" mode and prefix the title with "[DRAFT]"
To run the workflow, one of the maintainers has to approve. I try to do that as soon as I get a review in, and my colleagues do the same. To run the same locally, you just have to make sure you have everything installed with pip install -e .[dev] and then run the tests with python -m pytest ./tests/unit and other checking with pre-commit run --all-files. Any issues that pop up need to be fixed. If you don't understand them, please feel free to reach out here or on our Discord and someone will chime in with ideas 🙂
Thanks again for contributing to PyRIT!
Thanks for the info! I have started working on this issue and will use discord if I encounter any more issues.
@SidraEffendi any updates here? Would love to help you get unblocked. If you're no longer able to continue that's fine, too. In that case, we'll free it up for someone else. Otherwise, if I don't hear back by end of June I'll free it up.
I setup the dev containers and tested some unit tests and its running fine. To start, I used the load_dataset function for library datasets to access the data from hugging face. I am getting error on the 'load_dataset' function. When I run the llm_latent_adversarial_training_harmful_dataset file, which also uses the 'load_dataset' function, it runs fine. I looked online and found the closest discussion to my error here: error discussion However, my code is no different than what is mentioned in other files. Adding my code and error here:
from datasets import load_dataset
from pyrit.models import SeedPromptDataset
from pyrit.models.seed_prompt import SeedPrompt
def fetch_moral_integrity_dataset() -> SeedPromptDataset:
data = load_dataset("SALT-NLP/MIC","default")
prompts = [item["prompt"] for item in data["train"]]
# Create SeedPrompt instances from each example in 'prompts'
seed_prompts = [
SeedPrompt(
value=prompt,
data_type="text",
name="SALT-NLP/MIC",
dataset_name="SALT-NLP/MIC",
description="This dataset contains prompts used to capture moral assumption in llm",
source="https://huggingface.co/datasets/SALT-NLP/MIC",
)
for prompt in prompts
]
seed_prompt_dataset = SeedPromptDataset(prompts=seed_prompts)
return seed_prompt_dataset
Error stacktrace:
Traceback (most recent call last): File "/workspace/pyrit/datasets/moral_integrity_dataset.py", line 79, in <module> fetch_moral_integrity_dataset() File "/workspace/pyrit/datasets/moral_integrity_dataset.py", line 21, in fetch_moral_integrity_dataset data = load_dataset("SALT-NLP/MIC","default") ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/opt/conda/envs/pyrit-dev/lib/python3.11/site-packages/datasets/load.py", line 1392, in load_dataset builder_instance = load_dataset_builder( ^^^^^^^^^^^^^^^^^^^^^ File "/opt/conda/envs/pyrit-dev/lib/python3.11/site-packages/datasets/load.py", line 1132, in load_dataset_builder dataset_module = dataset_module_factory( ^^^^^^^^^^^^^^^^^^^^^^^ File "/opt/conda/envs/pyrit-dev/lib/python3.11/site-packages/datasets/load.py", line 1031, in dataset_module_factory raise e1 from None File "/opt/conda/envs/pyrit-dev/lib/python3.11/site-packages/datasets/load.py", line 989, in dataset_module_factory raise RuntimeError(f"Dataset scripts are no longer supported, but found {filename}") RuntimeError: Dataset scripts are no longer supported, but found MIC.py
I'm not familiar with this error. Other than trying it myself and I don't have any ideas on how to solve this right now, sorry.