juriscraper icon indicating copy to clipboard operation
juriscraper copied to clipboard

Error parsing district docket due to multiple dockets in single HTML, IndexError: list index out of range

Open sentry-io[bot] opened this issue 2 years ago • 4 comments

Sentry Issue: COURTLISTENER-3CX

IndexError: list index out of range
(2 additional frame(s) were not displayed)
...
  File "cl/recap/tasks.py", line 527, in process_recap_docket
    data = report.data
  File "juriscraper/pacer/docket_report.py", line 386, in data
    return super().data
  File "juriscraper/pacer/docket_report.py", line 68, in data
    data["parties"] = self.parties
  File "juriscraper/pacer/docket_report.py", line 588, in parties
    self._add_criminal_data_to_parties(parties, party_rows)
  File "juriscraper/pacer/docket_report.py", line 745, in _add_criminal_data_to_parties
    parties[current_party_i]["criminal_data"] = criminal_data

sentry-io[bot] avatar Jan 04 '23 16:01 sentry-io[bot]

I checked briefly this error, seems that the problem is that the received file is a district docket that mentions the same case (same title: USA v. BANKMAN-FRIED) multiple times but with different docket numbers: 1:22-cr-00673-LAK-1, 1:22-cr-00673-LAK-2, 1:22-cr-00673-LAK-3

It also contains a docket entries table for each one.

Here is the original file: html-example.zip

I'm wondering if this is an error from PACER or something we need to support?

albertisfu avatar Jan 04 '23 16:01 albertisfu

Woah. I've never seen this before, I don't think. I guess it'd be nice to support this, but until we know how it's even generated, I guess we can't really do it.

@johnhawkinson, have you ever seen a webpage like this in PACER, where it shows multiple dockets one after another after another? If so, do you know how it's created?

So you don't have to open Alberto's zip, the HTML is a single file with:

  • Case 1 docket (metadata, parties, entries)
  • Case 2 docket (metadata, parties, entries)
  • Case 3 docket (metadata, parties, entries)

Each of the three cases are related to the same case, via doppelganger bug. Weird!

mlissner avatar Jan 04 '23 17:01 mlissner

Not sure if this helps but when I pulled this docket on PACER, the docket heading looked slightly different depending on whether opening the 'All Defendants' docket sheet or the single defendant docket sheet.

See attached screencaps (underline added, obviously) docket_alldefs docket_singledef

The 'All Defendants' string might be useful when parsing. Either that or the judge initials-defendant number string (e.g., LAK-1, LAK-2...). It might help the parser know when to check index length before writing data to the array.

DasWordNerd avatar Jan 12 '23 02:01 DasWordNerd

Honestly, I'm inclined to just let this crash some more. I don't think we've gotten this error more than a few times ever.

mlissner avatar Jan 12 '23 21:01 mlissner