agentic_security icon indicating copy to clipboard operation
agentic_security copied to clipboard

Add pytest tests for dataset URL accessibility

Open fardin-developer opened this issue 1 year ago • 5 comments

issue #72

This PR introduces unit tests in test_registry.py to ensure the datasets in REGISTRY are accessible and functioning correctly. The tests focus on validating the availability of the dataset URLs, specifically checking if the Hugging Face links are active and returning the expected responses. This helps proactively identify broken or inaccessible datasets, contributing to smoother maintenance and management of the dataset registry.

fardin-developer avatar Jan 23 '25 17:01 fardin-developer

Hi @fardin-developer , thx for the PR. Could you plz take a look to the unit test failures on the CI?

msoedov avatar Jan 23 '25 20:01 msoedov

Hi @fardin-developer , thx for the PR. Could you plz take a look to the unit test failures on the CI?

Hi, The test cases are failing as the hugging face requests are throwing 401 unauthorised. Do I need to send any specific headers in requests? eg url: https://huggingface.co/ShawnMenz/DAN_jailbreak

fardin-developer avatar Jan 24 '25 10:01 fardin-developer

Hey @fardin-developer, I just added HUGGINGFACE_API_KEY to the repo secret that will be accessible in GitHub Actions. I guess locally you need to either generate your own token https://huggingface.co/settings/tokens for testing or install and run huggingface-cli login

I would suggest changing HTTP GET requests to load_dataset, from datasets import load_dataset.

Thank you for your effort; that's great progress!

msoedov avatar Jan 24 '25 10:01 msoedov

Hey @fardin-developer, I just added HUGGINGFACE_API_KEY to the repo secret that will be accessible in GitHub Actions. I guess locally you need to either generate your own token https://huggingface.co/settings/tokens for testing or install and run huggingface-cli login

I would suggest changing HTTP GET requests to load_dataset, from datasets import load_dataset.

Thank you for your effort; that's great progress!

I just wanted to mention that all the URLs inside agentic_security/probe_data/init.py are invalid and returning 404 errors. I tried opening them in the browser as well and they didn’t work. However, I tested with a valid URL like https://huggingface.co/datasets/fka/awesome-chatgpt-prompts and it worked fine without any issues.

fardin-developer avatar Jan 24 '25 11:01 fardin-developer

@fardin-developer let me pull your branch and look into that

msoedov avatar Jan 24 '25 21:01 msoedov