juriscraper icon indicating copy to clipboard operation
juriscraper copied to clipboard

Fill `afcca` gap

Open grossir opened this issue 1 year ago • 1 comments

Part of #929

Between February 1st, 2021 and February 1st 2023, we have 0 documents in CL. We are missing more than 250 documents from 2021 and 2022 (198)

grossir avatar Apr 16 '24 14:04 grossir

Command to fill the gap:

docker exec -it cl-django python /opt/courtlistener/manage.py cl_back_scrape_opinions  --courts juriscraper.opinions.united_states.federal_special.afcca --backscrape-start=2021 --backscrape-end=2023 

grossir avatar Apr 16 '24 23:04 grossir

After Ramiro ran the backscrape we have a single missing document for 2021, the one from 7 Apr 2021 I got the hash manually, and again, the hash already existed in Courtlistener, the document was duplicated.

No missing docs in 2022

For 2023 I got the following missmatches:

Error in 2023-06-27: 2 in site v 3 in db
Error in 2023-04-10: 1 in site v 2 in db
Error in 2023-02-03: 4 in site v 2 in db
Error in 2023-01-26: 3 in site v 2 in db

For 2023-01-26, there 1 and 2 have the same document / hash

For 2023-02-03, 3 of the opinions have the same hash

For 2023-04-10 and 2023-06-27, where the are more clusters in the db than in the site, it seems that opinions previously published on the site have since disspeared. For 2023-04-10, there are two: 1, 2. US vs Lara no longer appears on the source in that date, but it is referenced inside a more recent opinion:

On 10 April 2023, we issued an unpublished opinion where we found that Appellant’s pleas of guilty were not knowing, intelligent acts done with sufficient awareness of the relevant circumstances and likely consequences.

Knowing why the counts don't match, I am closing this issue as completed

grossir avatar May 30 '24 00:05 grossir