Fill `illappct` gaps
Part of #929
Between October 22, 2019 and May 21, 2020 we have 0 documents. We are missing around 1600 documents (152 per page, 11 full pages)
Between May 29, 2021 and November 15, 2021 we have 0 documents. We are missing around 1350 documents
To solve this, a dynamic backscraper will be implemented.
Looks like we won't need to worry about the 2015/6 stuff and the HTML stuff based on your research.
On illappct from 2010 to the past, most rows have no citation string, thus the current scraper won't get the docket number. To support that we would need to enhance the scraper.
However, it seems we do not need to backscrape earlier years for this source. We have a lot of data for illappct. For example, for 2006, the source returns 5 pages, which is at most 750 records. However, on CL we have ~2800 opinions for this court in this period, which is 4x the amount in the source...
A quick check shows that we have some duplicates:
Example 1: a, b Example 2: a, b
Still, that doesn't explain the 4x amount...
(I am closing and reopening the PR since it is bugged again and isn't picking up the last commit I pushed)
Commands to fill the gaps:
docker exec -it cl-django python /opt/courtlistener/manage.py cl_back_scrape_opinions --courts juriscraper.opinions.united_states.state.illappct --backscrape-start=10/21/2019 --backscrape-end=05/20/2020
docker exec -it cl-django python /opt/courtlistener/manage.py cl_back_scrape_opinions --courts juriscraper.opinions.united_states.state.illappct --backscrape-start=05/28/2021 --backscrape-end=11/16/2021
For the full date range we now have 2899 documents, very close to the expected estimation of 2950 documents.
From the logs, 130 documents were skipped due to having no URL
WARNING Opinion '2021 IL App (1st) 161797-U' has no URL. (Likely a withdrawn opinion).
4 due to having no docket
WARNING Opinion '2021 IL (2d) 200636' has no docket.
And some cases where the URL was broken
ERROR UnexpectedContentTypeError: https://www.illinoiscourts.gov/resources/37ca7e26-546f-4cc6-b546-66a3e3a4e72a/file
ERROR UnexpectedContentTypeError: https://www.illinoiscourts.gov/resources/27e44736-1c31-4ede-80fa-0ca5b71d25ad/file
ERROR UnexpectedContentTypeError: https://www.illinoiscourts.gov/resources/9c8cfe9a-e85d-44cb-bedf-bf818f13e9c6/file
ERROR UnexpectedContentTypeError: https://www.illinoiscourts.gov/resources/cd715f77-1b51-46a8-a83e-1e6fe8c969f4/file
ERROR UnexpectedContentTypeError: https://www.illinoiscourts.gov/resources/2dc04981-6a2a-4f6e-bac4-3ecb479bb2da/file
ERROR UnexpectedContentTypeError: https://www.illinoiscourts.gov/resources/a53f4980-6011-4317-9cfc-3d19add9ee6f/file
ERROR UnexpectedContentTypeError: https://www.illinoiscourts.gov/resources/88d8c0da-85da-417d-b5a0-fa437444c729/file
ERROR UnexpectedContentTypeError: https://www.illinoiscourts.gov/resources/959f4646-289a-405e-ac2a-93af38bfec77/file
ERROR UnexpectedContentTypeError: https://www.illinoiscourts.gov/resources/ea39977b-5b95-4cc4-8302-ae6b389f1bbd/file