juriscraper
juriscraper copied to clipboard
Add validations for case names, dates, and download URLs in _sanity_check
This pull request introduces new validations to the _check_sanity method in AbstractSite to enhance data integrity checks and improve error handling.
Enhancements to _check_sanity validations:
- Added checks for suspicious file extensions in
download_urlsusing a regular expression to detect potentially unsafe or unexpected file types. - Introduced validation for forbidden characters in
case_names, logging warnings when detected. - Added a new sanity check to ensure
case_datesare not earlier than the year 1900, raising an exception for invalid dates.
Looks like we added validation for url endings --- can we remove this please and focus just on the validation for dates
I think I would want to add more tests and do further research into the other components and I think it complicates this PR.