juriscraper icon indicating copy to clipboard operation
juriscraper copied to clipboard

`okla`, `oklacivapp`, `oklacrimapp` blocking document download

Open grossir opened this issue 8 months ago • 13 comments

This has been happening since May 29. So, if we fix it ASAP, we won't have to make a backscraper to fill any possible gaps

It's loading the source page; but not the individual opinion pages. On standalone and local Juriscraper it's working fine, but it's blocked on the server... python sample_caller.py -c juriscraper.opinions.united_states.state.okla --verbosity 3 -b

Checking on a server shell, it confirms the server IP is specifically blocked

In [1]: import requests

In [2]: r = requests.get("https://www.oscn.net/applications/oscn/deliverdocument.asp?citeid=548589")
Out[2]: <Response [403]>

In [3]: r.text
# below...

Image

Sentry Issue: COURTLISTENER-9YJ

HTTPError: 403 Client Error: Forbidden for url: https://www.oscn.net/applications/oscn/deliverdocument.asp?citeid=548272
(1 additional frame(s) were not displayed)
...
  File "cl/scrapers/management/commands/cl_scrape_opinions.py", line 400, in handle
    self.parse_and_scrape_site(mod, options)
  File "cl/scrapers/management/commands/cl_scrape_opinions.py", line 364, in parse_and_scrape_site
    self.scrape_court(site, options["full_crawl"])
  File "cl/scrapers/management/commands/cl_scrape_opinions.py", line 261, in scrape_court
    self.ingest_a_case(
  File "cl/scrapers/management/commands/cl_scrape_opinions.py", line 298, in ingest_a_case
    content = get_binary_content(item["download_urls"], site)
  File "cl/scrapers/utils.py", line 304, in get_binary_content
    r.raise_for_status()

grossir avatar Jun 19 '25 14:06 grossir

Sentry Issue: COURTLISTENER-9YD

sentry[bot] avatar Jun 19 '25 14:06 sentry[bot]

Sentry Issue: COURTLISTENER-A2R

sentry[bot] avatar Jun 19 '25 14:06 sentry[bot]

Sentry Issue: COURTLISTENER-9YJ

sentry[bot] avatar Jun 19 '25 14:06 sentry[bot]

@Luis-manzur can you check if this is still happening and if we have any new opinions from the three courts

flooie avatar Jul 28 '25 14:07 flooie

Image

these are dates of the latest opinions in the system.

we are still blocked

Luis-manzur avatar Aug 04 '25 19:08 Luis-manzur

Here are three folders with missing Oklahoma court opinions to upload manually.

@flooie

Luis-manzur avatar Aug 08 '25 15:08 Luis-manzur

how did you gather them? @Luis-manzur

flooie avatar Aug 22 '25 16:08 flooie

using juriscraper locally and the flag --save-for-manual-upload that I created @flooie

Luis-manzur avatar Aug 22 '25 17:08 Luis-manzur

Sentry Issue: COURTLISTENER-A6Z

@grossir

sentry[bot] avatar Aug 28 '25 01:08 sentry[bot]

New variation using cloudflare

'\r\n<html>\r\n  <head>\r\n\t<meta name="viewport" content="width=device-width, minimum-scale=1, initial-scale=1">    \r\n    <title>OSCN Turnstile</title>\r\n\t<script src="https://challenges.cloudflare.com/turnstile/v0/api.js" async defer></script>\r\n\t<style>\r\n\t\t.form_container {\r\n\t\t\tpadding-top: 2rem;\r\n\t\t\twidth: 304px;\r\n\t\t\ttext-align: center;\r\n\t\t\tmargin: auto;\r\n\t\t}\t\r\n\t\t.form_container input[type=\'submit\'] {\r\n\t\t\tpadding: 0.75rem;\r\n\t\t\ttext-transform: uppercase;\r\n\t\t}\r\n\t</style>\r\n\t\t<link rel="stylesheet" type="text/css" href="/wp-content/themes/oscn-theme/navigation-style.css?v=1.0.1">\t\r\n\t\t<link rel="stylesheet" type="text/css" href="/assets/css/oscn-navigation.css">\t\r\n  </head>\r\n  <body>\r\n\t<header id="masthead" class="site-header header" data-elastic-exclude style="">\r\n\t\t<div class="nav_container">\r\n\t\t\t\t<a class="oscn-brand" href="https://www.oscn.net/">\r\n\t\t\t\t\t<div class="oscn-brand-container">\r\n\t\t\t\t\t\t<div class...

grossir avatar Aug 28 '25 01:08 grossir

Sentry Issue: COURTLISTENER-A6X

sentry[bot] avatar Sep 02 '25 15:09 sentry[bot]

Sentry Issue: COURTLISTENER-A6Y

sentry[bot] avatar Sep 02 '25 15:09 sentry[bot]

We have a temporary work around for this, perhaps this gets moved to future work

flooie avatar Oct 20 '25 13:10 flooie