Add pytest tests for dataset URL accessibility
issue #72
This PR introduces unit tests in test_registry.py to ensure the datasets in REGISTRY are accessible and functioning correctly. The tests focus on validating the availability of the dataset URLs, specifically checking if the Hugging Face links are active and returning the expected responses. This helps proactively identify broken or inaccessible datasets, contributing to smoother maintenance and management of the dataset registry.
Hi @fardin-developer , thx for the PR. Could you plz take a look to the unit test failures on the CI?
Hi @fardin-developer , thx for the PR. Could you plz take a look to the unit test failures on the CI?
Hi, The test cases are failing as the hugging face requests are throwing 401 unauthorised. Do I need to send any specific headers in requests? eg url: https://huggingface.co/ShawnMenz/DAN_jailbreak
Hey @fardin-developer, I just added HUGGINGFACE_API_KEY to the repo secret that will be accessible in GitHub Actions.
I guess locally you need to either generate your own token https://huggingface.co/settings/tokens for testing or install and run huggingface-cli login
I would suggest changing HTTP GET requests to load_dataset, from datasets import load_dataset.
Thank you for your effort; that's great progress!
Hey @fardin-developer, I just added
HUGGINGFACE_API_KEYto the repo secret that will be accessible in GitHub Actions. I guess locally you need to either generate your own token https://huggingface.co/settings/tokens for testing or install and runhuggingface-cli loginI would suggest changing HTTP GET requests to
load_dataset,from datasets import load_dataset.Thank you for your effort; that's great progress!
I just wanted to mention that all the URLs inside agentic_security/probe_data/init.py are invalid and returning 404 errors. I tried opening them in the browser as well and they didn’t work. However, I tested with a valid URL like https://huggingface.co/datasets/fka/awesome-chatgpt-prompts and it worked fine without any issues.
@fardin-developer let me pull your branch and look into that