feat: add error handling for scrapers with expected results
This pull request introduces error handling for scrapers that are expected to return results but fail to do so. The changes include updates to the CHANGES.md file to document the new feature, as well as modifications to the AbstractSite class in juriscraper to implement the functionality.
Documentation Updates:
-
CHANGES.md: Added a note under "Features" about the new error handling for scrapers with expected results.
Code Enhancements:
-
juriscraper/AbstractSite.py:- Added a new
should_have_resultsattribute in the__init__method to indicate whether a scraper is expected to return results. - Updated the
_check_sanitymethod to log an error ifshould_have_resultsisTrueand no results are returned, while maintaining a warning for cases where results are not required.
- Added a new
For this to be useful you will have to go through the scrapers one by one and identify those that "should_have_results" and set that attribute to true
I was wondering if we could do separate issues to not overload this PR.
I think I agree with @Luis-manzur that adding this field should be a separate PR.
I'd prefer all the changes to be together for the reasons below, but feel free to approve it and merge it as is
- the PR is not cluttered as it is right now, it's a few lines in a single file
- changes don't really affect anything without changing the relevant scraper files, so there is not much to review here
- you will need to open, link to the issue (or another one), and review another PR instead of doing it now, which is more clerical work
Howdid you find which ones to update?
@Luis-manzur can you resolve conflicts and respond to my question?
to Identify the sites that needed this update I looked up inside the code of each one looking that the there were no filtering before or after the first request, and confirmed going into the court page. also I left outside sites that the court page don't need any filtering but they clear the opinion list each month/year.