PyRIT icon indicating copy to clipboard operation
PyRIT copied to clipboard

FEAT Add LLM-LAT/harmful-dataset #420

Open SnehaDharne opened this issue 1 year ago • 2 comments

Description

The issue was regarding adding a feature to ensure that LLM-LAT/harmful-dataset from hugging face was available within pyrit @nina-msft @jbolor21

  • created a function in the fetch_example_datasets.py module within pyrit/datasets that loaded and extracted the "prompts" column from LLM-LAT/harmful-dataset
  • made changes in the init.py within pyrit/datasets to include the new function in the env path

Tests and Documentation

  • added hf_harmful_dataset_testing.ipynb within pyrit/doc/code/orchestrators that test the dataset

SnehaDharne avatar Oct 06 '24 00:10 SnehaDharne

@microsoft-github-policy-service agree

SnehaDharne avatar Oct 06 '24 00:10 SnehaDharne

@SnehaDharne - great job!

Are you able to run the pre-commit hooks on your end? That should fix the build failures. If you installed PyRIT following our contributor guide, you should have pre-commit dependency available in your conda Python Environment.

pre-commit run --all-files

nina-msft avatar Oct 09 '24 19:10 nina-msft

@SnehaDharne let us know if you have any follow-up questions :-) We're ready to merge this work once the pre-commit hooks are run through on your end.

nina-msft avatar Oct 15 '24 19:10 nina-msft

Hi, yes I am able to run the pre-commit hook successfully.

SnehaDharne avatar Oct 15 '24 19:10 SnehaDharne

Hi, yes I am able to run the pre-commit hook successfully.

Hey @SnehaDharne - did the pre-commit hooks make any changes to your code that you might need to push? I just reran this PR through the CI and am seeing it fail here: https://github.com/Azure/PyRIT/actions/runs/11374109930/job/31691266872?pr=437

trim trailing whitespace.................................................Failed
- hook id: trailing-whitespace
- exit code: 1
- files were modified by this hook
Fixing doc/code/orchestrators/hf_harmful_dataset_testing.py
Fixing pyrit/datasets/fetch_example_datasets.py
fix end of files.........................................................Passed
check yaml...............................................................Passed
check for added large files..............................................Passed
detect private key.......................................................Passed
black....................................................................Failed
- hook id: black
- files were modified by this hook
reformatted pyrit/datasets/__init__.py
reformatted pyrit/datasets/fetch_example_datasets.py
All done! ✨ 🍰 ✨
2 files reformatted, [32](https://github.com/Azure/PyRIT/actions/runs/11374109930/job/31691266872?pr=437#step:11:33)4 files left unchanged.
flake8...................................................................Failed
- hook id: flake8
- exit code: 1
pyrit/datasets/fetch_example_datasets.py:4[33](https://github.com/Azure/PyRIT/actions/runs/11374109930/job/31691266872?pr=437#step:11:34):121: E501 line too long (122 > 120 characters)
Check Links in Python and md Files.......................................Passed
pylint...................................................................Passed
mypy.....................................................................Passed

When you run them locally, some will fix themselves but these changes need to be pushed. From the results above, the trim trailing whitespace and black pre-commit hooks will generate their own changes, but flake8 where the line is too long on L121 is one that you'll have to fix manually and push changes for.

If you're not seeing this, could you post your results in a comment here and we can investigate what might be going on with our code? Thanks!

nina-msft avatar Oct 17 '24 17:10 nina-msft

Screenshot 2024-10-17 at 4 55 28 PM I made the fix, but I am not able to push the changes since the git fetch main pull made a lot of changes with my branch.

SnehaDharne avatar Oct 17 '24 20:10 SnehaDharne

Screenshot 2024-10-17 at 4 55 28 PM I made the fix, but I am not able to push the changes since the git fetch main pull made a lot of changes with my branch.

Sorry @SnehaDharne, I'm not quite following. Are you unable to push because of merge conflicts with main?

I can look into making these changes for you and pushing to your branch if you're having issues - then we still maintain that you're the author of this work :-) I'll take a look tomorrow!

nina-msft avatar Oct 17 '24 23:10 nina-msft

okay I have pushed the changes. let me know if the issue still persists.

SnehaDharne avatar Oct 18 '24 04:10 SnehaDharne

Merged - thank you @SnehaDharne 🥳

nina-msft avatar Oct 18 '24 21:10 nina-msft