`ri` URL has changed
Sentry Issue: COURTLISTENER-7EK
HTTPError: 404 Client Error: Not Found for url: https://www.courts.ri.gov/Courts/SupremeCourt/SupremeOrders/Forms/20232024.aspx
(2 additional frame(s) were not displayed)
...
File "cl/scrapers/management/commands/cl_scrape_opinions.py", line 387, in handle
self.parse_and_scrape_site(mod, options)
File "cl/scrapers/management/commands/cl_scrape_opinions.py", line 350, in parse_and_scrape_site
site = mod.Site().parse()
We need to update the scraper to the new endpoint https://www.courts.ri.gov/Pages/ood.aspx?k=(RIJCourt:%27Supreme%27)%20AND%20(ContentType:%27RIJOpinion%27)
Explaning the changes
We have 2 ri scrapers, ri_u.py and ri_p.py. They get these URLs, where the years used to change depending on the court terms. From a comment on the script: "This court hears things from mid-September to end of June. This defines the "term" for that year, which triggers the website updates.", The term ends on :term_end = datetime(this_year, 9, 15)
https://www.courts.ri.gov/Courts/SupremeCourt/SupremeOpinions/Forms/20232024.aspx https://www.courts.ri.gov/Courts/SupremeCourt/SupremeOrders/Forms/20232024.aspx
So, we may think that the URLs were updated before the term ended, but they do not exist neither for future nor past terms
https://www.courts.ri.gov/Courts/SupremeCourt/SupremeOpinions/Forms/20242025.aspx https://www.courts.ri.gov/Courts/SupremeCourt/SupremeOrders/Forms/20242025.aspx
https://www.courts.ri.gov/Courts/SupremeCourt/SupremeOpinions/Forms/20222023.aspx https://www.courts.ri.gov/Courts/SupremeCourt/SupremeOrders/Forms/20222023.aspx
Sentry Issue: COURTLISTENER-7EK
Sentry Issue: COURTLISTENER-7EJ
isnt that wonderful
Ugh - I took a look at this and have a few thoughts.
- this should be an easy rewrite - they still expose a JSON endpoint
- the U/P distinction should be merged into one scraper. the current website provides published opinions and published orders that I think should be collected
- The miscellaneous court orders etc can be ignored imho.
- The endpoint is pliable to allow for scraping 100s of opinions at a time and is quite responsive.
the only weirdness is the schemas.microsoft.com/sharepoint xml that is required for the parameters.
partially fixed. but appears to stop early.
Closing this because the scraper is back up