juriscraper icon indicating copy to clipboard operation
juriscraper copied to clipboard

Fill `coloctapp` gaps

Open grossir opened this issue 1 year ago • 8 comments

From #929 , related to #974

coloctapp

Between September 29, 2021 and February 02, 2022 we have 0 documents. We are missing documents, but must go into PDFs to get them now

grossir avatar Apr 02 '24 13:04 grossir

@flooie can you please check this source? From what I see we may need to parse the pdfs, since old case information is not available on HTML, as it is in colo

grossir avatar Apr 04 '24 23:04 grossir

Hmm. Is it possible the html changes.

flooie avatar Apr 04 '24 23:04 flooie

Some news about coloctapp, the Colorado Courts have just (well, on March 1, 2024) launched a new site for Appellate Opinions, and it actually has past opinions in HTML. We could implement the backscraper from there instead of dealing with PDFs

Check it out here

grossir avatar Apr 10 '24 21:04 grossir

CO was one of the worst states. Does this mean it's finally not so terrible?

mlissner avatar Apr 10 '24 21:04 mlissner

This new Colorado site seems to have no search filters except for "court". Getting the document url requires more steps/requests. And it uses vlex as the backend. The downloaded opinion PDF comes in a zip, and the document has a vlex link in it image

So, I don't know if it qualifies as not being terrible, but at least it will let us look for past opinions without going into PDFs

grossir avatar Apr 10 '24 21:04 grossir

So, we also have a more recent gap. We are missing every Opinion announced on:

  • April 4, 2024 (2 opinions)
  • March 14, 21, 28 (2, 1 and 3 opinions respectively)
  • February 1, 15, 22, 29 (...)

I don't know why the scraper has been failing...

grossir avatar Apr 10 '24 23:04 grossir

Command to fill the gaps

docker exec -it cl-django python /opt/courtlistener/manage.py cl_back_scrape_opinions --courts juriscraper.opinions.united_states.state.coloctapp --backscrape-start=09/28/2021 --backscrape-end=02/01/2022

grossir avatar May 08 '24 16:05 grossir

The old scraper went down some months ago. Most recent colotctapp opinion is from March 7th, 2024, so this is a new gap

grossir avatar Jul 11 '24 14:07 grossir