Rossi
Rossi
I have divided this task in 2 big approaches, depending on whether the versions share the exact same URL or not ### Versions that share the same URL The easiest...
About what courts are affected the most ( the top 15 rows sum 153990) | court_id | count | |-------------------|-------| | coloctapp | 56301 | | neb | 24697 |...
Running the script below (since the `Site.cleanup_content` we actually needed no longer exists in Juriscraper). Using this [one](https://github.com/freelawproject/juriscraper/blob/304315ab5fb1fbc1a84bd75afe8e014834e2ccf8/juriscraper/opinions/united_states/state/ny.py) This is the current progress (it must be run with all the...
For the big `nytrial` courts there was nothing of note; but we should run it for the "child" courts `nysupct` [nysupct-different-hash-duplicates.txt](https://github.com/user-attachments/files/22504738/nysupct-different-hash-duplicates.txt) [nytrial-logs.txt](https://github.com/user-attachments/files/22517312/nytrial-logs.txt) Other courts that should be run ```sql select...
Ran another iteration with a new cleanup_content that also deletes `iframe` and uses `.strip()` before returning Deleted 77k opinions that had been versioned together for `nyappdiv` {'different hash after cleanup':...
The final duplicate stragglers are due to a limitation in the `delete_duplicates` code. It tries to keep the last available version as the "main" candidate, and compares all the other...
With the fix in [place](https://github.com/freelawproject/courtlistener/issues/6398), rerun the command for `nyappdiv` `{'same cluster': 853, 'same docket': 853, 'deleted opinion': 853}` `nyappterm` {'same cluster': 1976, 'same docket': 1976, 'deleted opinion': 1976})
Now, there are no NY URLs with more than 10 opinions related to them... Big groups we have left are coloctapp and neb ```sql select court_id, download_url, count, nsha1 from...
Running this query and displaying it somewhere is the most basic scraper status page: ```sql SELECT court_id, max(so.date_created) as latest_creation FROM (SELECT * FROM search_docket WHERE court_id IN (SELECT id...
A materialized view basically stores the results of a query as a named "table" - saves computation time if you just want to check the stored results - makes it...