Rossi issues

Results 172 issues of


                                            Rossi

feat(NewOpinionSite): support returning nested data structures

Related to #883 - add jsonschema dependencies - create JSON Schemas for each scraped object, corresponding to courtlistener's Django Models - validate scraped data using JSONSchemaValidator - support nested objects...

feat(tex): support dynamic backscraping in tex

Assumes backscrape keyword arguments from the dynamic backscraping PR Solves #944

Fill `tex` gaps

Related to #929 Between May 05, 2020 and August 20, 2020 we have [2 documents](https://www.courtlistener.com/?q=court_id%3Atex&type=o&order_by=dateFiled%20asc&stat_Precedential=on&stat_Non-Precedential=on&filed_after=05/05/2020&filed_before=08/20/2020). We are missing 47 documents This will need to updated `tex` to handle backscrapes, and...

Fill `ny` New York Court of Appeals gaps

This is part of #929 Missing around 100 documents Between April 27, 2018 and February 13, 2019 we have [1 document1](https://www.courtlistener.com/?q=court_id%3Any&type=o&order_by=dateFiled%20asc&stat_Precedential=on&stat_Non-Precedential=on&filed_after=04%2F27%2F2018&filed_before=02%2F13%2F2019). We are missing [92 documents](https://iapps.courts.state.ny.us/lawReporting/CourtOfAppealsSearch?searchType=opinion) Between June 16, 2023...

texag is buggy and creating duplicates

The scraper is picking the HTML "detail" page link instead of the PDF link. The XPATH should be updated We have a bunch of HTML pages and duplicates on CL...

bug

Begin Gap Analysis

I think there are 3 big classes of gaps: - **0 gap**: when we have 0 documents for a time period, and have a regular count before and after. We...

alaska and alaskactapp missing opinions

The scraper is skipping all citable opinions. It skips rows which have no PDF links in the first column. Coincidentaly, all the opinions with a citation string have no such...

Texas Supreme and Appellate scrapers enhacements

There is more data available in the HTML we already request, that we don't parse (the scraping class is in `tex.py`) This is an instance of #889 At least these...

Support getting multiple fields from a secondary page

The current method to get data from a secondary page is to use a `DeferringList`. This method is designed for parsing a single field. However, we may want to get...

enhancement

Add date format validation to `test_extract_from_text_properly_implemented` on test_ScraperExtractFromTextTest.py

We had an error on courlistener when extracting date_filed using `extract_from_text` from recently added bap1 ``` { "OpinionCluster": {"date_filed": "July 29, 2022"}, }, ... File “/opt/courtlistener/cl/scrapers/tasks.py”, line 179, in extract_doc_content...