juriscraper icon indicating copy to clipboard operation
juriscraper copied to clipboard

`ca1` source not showing most recent opinions

Open grossir opened this issue 1 year ago • 2 comments

Current scraper uses this endpoint which shows an opinion published in February 2nd 2024 as the most recent one. This is also the newest opinion from ca1 that we have on Courtlistener . However, the RSS feeds show more recent opinions, between April 30 2024 and April 2nd 2024.

UPDATE: A new endpoint with fresh data is available here

Will implement a fix for current opinions, and a backscraper for the months we are missing

@flooie

grossir avatar May 01 '24 21:05 grossir

Of note are the comments from one of the original authors, for which I sadly found no new place

            s = "1996/05/30"  # My life is thus lain to waste.
            s = s.replace("O1-", "01-")  # I grow older, the input grows worse.

grossir avatar May 01 '24 22:05 grossir

To collect the opinions we are missing from February to present: (will need to bump juriscraper and courtlistener, but the backscraper PR in courtlistener is already merged)

docker exec -it cl-django python /opt/courtlistener/manage.py cl_back_scrape_opinions --courts juriscraper.opinions.united_states.federal_appellate.ca1 --backscrape-start=02/02/2024 --backscrape-end=06/11/2024

grossir avatar May 02 '24 18:05 grossir

Backscraper was ran by Ramiro, and the gap has been closed. Closing this issue as completed

grossir avatar Jun 12 '24 18:06 grossir

wahoo

flooie avatar Jun 12 '24 18:06 flooie