juriscraper icon indicating copy to clipboard operation
juriscraper copied to clipboard

feat: add error handling for scrapers with expected results

Open Luis-manzur opened this issue 8 months ago • 4 comments

This pull request introduces error handling for scrapers that are expected to return results but fail to do so. The changes include updates to the CHANGES.md file to document the new feature, as well as modifications to the AbstractSite class in juriscraper to implement the functionality.

Documentation Updates:

  • CHANGES.md: Added a note under "Features" about the new error handling for scrapers with expected results.

Code Enhancements:

  • juriscraper/AbstractSite.py:
    • Added a new should_have_results attribute in the __init__ method to indicate whether a scraper is expected to return results.
    • Updated the _check_sanity method to log an error if should_have_results is True and no results are returned, while maintaining a warning for cases where results are not required.

Luis-manzur avatar Jun 16 '25 21:06 Luis-manzur

For this to be useful you will have to go through the scrapers one by one and identify those that "should_have_results" and set that attribute to true

I was wondering if we could do separate issues to not overload this PR.

Luis-manzur avatar Jun 17 '25 15:06 Luis-manzur

I think I agree with @Luis-manzur that adding this field should be a separate PR.

flooie avatar Jun 17 '25 18:06 flooie

I'd prefer all the changes to be together for the reasons below, but feel free to approve it and merge it as is

  • the PR is not cluttered as it is right now, it's a few lines in a single file
  • changes don't really affect anything without changing the relevant scraper files, so there is not much to review here
  • you will need to open, link to the issue (or another one), and review another PR instead of doing it now, which is more clerical work

grossir avatar Jun 17 '25 20:06 grossir

Howdid you find which ones to update?

flooie avatar Jun 23 '25 17:06 flooie

@Luis-manzur can you resolve conflicts and respond to my question?

flooie avatar Jul 02 '25 15:07 flooie

to Identify the sites that needed this update I looked up inside the code of each one looking that the there were no filtering before or after the first request, and confirmed going into the court page. also I left outside sites that the court page don't need any filtering but they clear the opinion list each month/year.

Luis-manzur avatar Jul 02 '25 19:07 Luis-manzur