`okla`, `oklacivapp`, `oklacrimapp` blocking document download
This has been happening since May 29. So, if we fix it ASAP, we won't have to make a backscraper to fill any possible gaps
It's loading the source page; but not the individual opinion pages. On standalone and local Juriscraper it's working fine, but it's blocked on the server...
python sample_caller.py -c juriscraper.opinions.united_states.state.okla --verbosity 3 -b
Checking on a server shell, it confirms the server IP is specifically blocked
In [1]: import requests
In [2]: r = requests.get("https://www.oscn.net/applications/oscn/deliverdocument.asp?citeid=548589")
Out[2]: <Response [403]>
In [3]: r.text
# below...
Sentry Issue: COURTLISTENER-9YJ
HTTPError: 403 Client Error: Forbidden for url: https://www.oscn.net/applications/oscn/deliverdocument.asp?citeid=548272
(1 additional frame(s) were not displayed)
...
File "cl/scrapers/management/commands/cl_scrape_opinions.py", line 400, in handle
self.parse_and_scrape_site(mod, options)
File "cl/scrapers/management/commands/cl_scrape_opinions.py", line 364, in parse_and_scrape_site
self.scrape_court(site, options["full_crawl"])
File "cl/scrapers/management/commands/cl_scrape_opinions.py", line 261, in scrape_court
self.ingest_a_case(
File "cl/scrapers/management/commands/cl_scrape_opinions.py", line 298, in ingest_a_case
content = get_binary_content(item["download_urls"], site)
File "cl/scrapers/utils.py", line 304, in get_binary_content
r.raise_for_status()
Sentry Issue: COURTLISTENER-9YD
Sentry Issue: COURTLISTENER-A2R
Sentry Issue: COURTLISTENER-9YJ
@Luis-manzur can you check if this is still happening and if we have any new opinions from the three courts
these are dates of the latest opinions in the system.
we are still blocked
how did you gather them? @Luis-manzur
using juriscraper locally and the flag --save-for-manual-upload that I created @flooie
New variation using cloudflare
'\r\n<html>\r\n <head>\r\n\t<meta name="viewport" content="width=device-width, minimum-scale=1, initial-scale=1"> \r\n <title>OSCN Turnstile</title>\r\n\t<script src="https://challenges.cloudflare.com/turnstile/v0/api.js" async defer></script>\r\n\t<style>\r\n\t\t.form_container {\r\n\t\t\tpadding-top: 2rem;\r\n\t\t\twidth: 304px;\r\n\t\t\ttext-align: center;\r\n\t\t\tmargin: auto;\r\n\t\t}\t\r\n\t\t.form_container input[type=\'submit\'] {\r\n\t\t\tpadding: 0.75rem;\r\n\t\t\ttext-transform: uppercase;\r\n\t\t}\r\n\t</style>\r\n\t\t<link rel="stylesheet" type="text/css" href="/wp-content/themes/oscn-theme/navigation-style.css?v=1.0.1">\t\r\n\t\t<link rel="stylesheet" type="text/css" href="/assets/css/oscn-navigation.css">\t\r\n </head>\r\n <body>\r\n\t<header id="masthead" class="site-header header" data-elastic-exclude style="">\r\n\t\t<div class="nav_container">\r\n\t\t\t\t<a class="oscn-brand" href="https://www.oscn.net/">\r\n\t\t\t\t\t<div class="oscn-brand-container">\r\n\t\t\t\t\t\t<div class...
Sentry Issue: COURTLISTENER-A6X
Sentry Issue: COURTLISTENER-A6Y
We have a temporary work around for this, perhaps this gets moved to future work