juriscraper icon indicating copy to clipboard operation
juriscraper copied to clipboard

nycourts.gov scrape responds with HTTP 403 Forbidden

Open sentry[bot] opened this issue 1 year ago • 6 comments

While debugging a user's report concerning daily search alerts for records originating from NY courts, I found the following Sentry events:

Filed by: @Erosendo

sentry[bot] avatar Jun 11 '24 16:06 sentry[bot]

The user provided the IDs of alerts they believe are not working. Upon reviewing the queries associated with these alerts, I noticed a common thread: most of the queries include references to the following courts: ny, nyappdiv and nyappterm

ERosendo avatar Jun 11 '24 16:06 ERosendo

I will work on this. All these Sentry issues are for nyappdiv (nyappdiv_1st, nyappdiv_2nd...). If I find any more I will link them here

grossir avatar Jun 11 '24 16:06 grossir

On a first look, it seems our scraper server IP is blocked. I can access the forbidden URLs without problem

https://www.nycourts.gov/reporter/slipidx/aidxtable_1.shtml https://www.nycourts.gov/reporter/slipidx/aidxtable_2.shtml https://www.nycourts.gov/reporter/slipidx/aidxtable_3.shtml https://www.nycourts.gov/reporter/slipidx/aidxtable_4.shtml

We indeed have no data since May 17, 2024 for nyappdiv

This happens for most New York courts...

  • No data for ny since May 16, 2024. There are 9 decisions from that day to today on the source

  • No data for nyappterm since May 16, 2024. There are 12 opinions from that date to today on the source

  • nytrial "New York Other Courts" seem to be working fine

@flooie perhaps you can talk with the courts?

grossir avatar Jun 11 '24 17:06 grossir

Sent a message to this contact form: https://iapps.courts.state.ny.us/webteam/webteam.jsp

grossir avatar Jun 11 '24 21:06 grossir

I called the New York State Law Reporting Bureau and was transferred to a voicemail - but I did not catch the name - will update the CRM when I hear back. In the meantime - we wait.

flooie avatar Jun 13 '24 20:06 flooie

the apps.coruts.state.ny.us website is now working -

flooie avatar Jul 01 '24 20:07 flooie

As an update: we have fresh data for:

  • nyappdiv. Latest, July 26th

  • nytrial, latest July 26th

  • nyappterm, we have data from May - June, and none for July. This may be due to the date range of the query, which we should make smaller https://github.com/freelawproject/juriscraper/blob/a28e884b986b6651393c612adfab6399e6212bb1/juriscraper/opinions/united_states/state/nyappterm_1st.py#L46-L55 image

  • ny; unchanged; no data since May. But maybe the scraper is buggy?

Given this, maybe they are simply not blocking us anymore? @flooie

grossir avatar Jul 31 '24 17:07 grossir

This is still pending to test since our contact in the Court is out on vacation until September 3rd. However, he did clarify something, that the site we target in the ny scraper (example) will not work even if everything goes OK with the API key. So, we will probably have to re-write that scraper.

If your referring to the https://iapps.courts.state.ny.us/lawReporting/Search and https://www.nycourts.gov/ctapps/Decisions pages your right the api-key would not work from the information provided me from our security team. Only sites the api-key should work on is the lrb.nycourts.gov/* and www.nycourts.gov/reporter/*. Are you finding this to be the case?

What do you think @flooie , there is more detail on the email thread

grossir avatar Aug 21 '24 17:08 grossir

Changes are working, we got our first ny opinion since May https://www.courtlistener.com/opinion/10118643/stefanik-v-hochul/?q=court_id%3Any&type=o&order_by=dateFiled+desc&stat_Published=on

grossir avatar Sep 16 '24 21:09 grossir