FEAT Add LLM-LAT/harmful-dataset #420
Description
The issue was regarding adding a feature to ensure that LLM-LAT/harmful-dataset from hugging face was available within pyrit @nina-msft @jbolor21
- created a function in the fetch_example_datasets.py module within pyrit/datasets that loaded and extracted the "prompts" column from LLM-LAT/harmful-dataset
- made changes in the init.py within pyrit/datasets to include the new function in the env path
Tests and Documentation
- added hf_harmful_dataset_testing.ipynb within pyrit/doc/code/orchestrators that test the dataset
@microsoft-github-policy-service agree
@SnehaDharne - great job!
Are you able to run the pre-commit hooks on your end? That should fix the build failures. If you installed PyRIT following our contributor guide, you should have pre-commit dependency available in your conda Python Environment.
pre-commit run --all-files
@SnehaDharne let us know if you have any follow-up questions :-) We're ready to merge this work once the pre-commit hooks are run through on your end.
Hi, yes I am able to run the pre-commit hook successfully.
Hi, yes I am able to run the pre-commit hook successfully.
Hey @SnehaDharne - did the pre-commit hooks make any changes to your code that you might need to push? I just reran this PR through the CI and am seeing it fail here: https://github.com/Azure/PyRIT/actions/runs/11374109930/job/31691266872?pr=437
trim trailing whitespace.................................................Failed
- hook id: trailing-whitespace
- exit code: 1
- files were modified by this hook
Fixing doc/code/orchestrators/hf_harmful_dataset_testing.py
Fixing pyrit/datasets/fetch_example_datasets.py
fix end of files.........................................................Passed
check yaml...............................................................Passed
check for added large files..............................................Passed
detect private key.......................................................Passed
black....................................................................Failed
- hook id: black
- files were modified by this hook
reformatted pyrit/datasets/__init__.py
reformatted pyrit/datasets/fetch_example_datasets.py
All done! ✨ 🍰 ✨
2 files reformatted, [32](https://github.com/Azure/PyRIT/actions/runs/11374109930/job/31691266872?pr=437#step:11:33)4 files left unchanged.
flake8...................................................................Failed
- hook id: flake8
- exit code: 1
pyrit/datasets/fetch_example_datasets.py:4[33](https://github.com/Azure/PyRIT/actions/runs/11374109930/job/31691266872?pr=437#step:11:34):121: E501 line too long (122 > 120 characters)
Check Links in Python and md Files.......................................Passed
pylint...................................................................Passed
mypy.....................................................................Passed
When you run them locally, some will fix themselves but these changes need to be pushed. From the results above, the trim trailing whitespace and black pre-commit hooks will generate their own changes, but flake8 where the line is too long on L121 is one that you'll have to fix manually and push changes for.
If you're not seeing this, could you post your results in a comment here and we can investigate what might be going on with our code? Thanks!
I made the fix, but I am not able to push the changes since the git fetch main pull made a lot of changes with my branch.
Sorry @SnehaDharne, I'm not quite following. Are you unable to push because of merge conflicts with main?
I can look into making these changes for you and pushing to your branch if you're having issues - then we still maintain that you're the author of this work :-) I'll take a look tomorrow!
okay I have pushed the changes. let me know if the issue still persists.
Merged - thank you @SnehaDharne 🥳
I made the fix, but I am not able to push the changes since the git fetch main pull made a lot of changes with my branch.