juriscraper icon indicating copy to clipboard operation
juriscraper copied to clipboard

Fill `tex` gaps

Open grossir opened this issue 2 years ago • 1 comments

Related to #929

Between May 05, 2020 and August 20, 2020 we have 2 documents. We are missing 47 documents

This will need to updated tex to handle backscrapes, and pagination. The changes will support the gap solution of texapp and texcrimapp too

grossir avatar Feb 21 '24 23:02 grossir

docker exec -it cl-django python /opt/courtlistener/manage.py cl_back_scrape_opinions  --courts juriscraper.opinions.united_states.state.tex --backscrape-start=05/05/2020 --backscrape-end=08/20/2020`

grossir avatar May 07 '24 16:05 grossir

There is no longer a gap, we have 50 documents for the time period

However, I think there may be a bug in the backscraper. On the last element of the back scrape iterable, it re-scraped everything since the backscrape end date up to the present, 54 pages in total. The good side of that, it has downloaded some missing opinions DEBUG juriscraper.opinions.united_states.state.tex: Successfully crawled 17/584 opinions.

grossir avatar Jul 29 '24 23:07 grossir