juriscraper
juriscraper copied to clipboard
Error parsing district docket due to multiple dockets in single HTML, IndexError: list index out of range
Sentry Issue: COURTLISTENER-3CX
IndexError: list index out of range
(2 additional frame(s) were not displayed)
...
File "cl/recap/tasks.py", line 527, in process_recap_docket
data = report.data
File "juriscraper/pacer/docket_report.py", line 386, in data
return super().data
File "juriscraper/pacer/docket_report.py", line 68, in data
data["parties"] = self.parties
File "juriscraper/pacer/docket_report.py", line 588, in parties
self._add_criminal_data_to_parties(parties, party_rows)
File "juriscraper/pacer/docket_report.py", line 745, in _add_criminal_data_to_parties
parties[current_party_i]["criminal_data"] = criminal_data
I checked briefly this error, seems that the problem is that the received file is a district docket that mentions the same case (same title: USA v. BANKMAN-FRIED) multiple times but with different docket numbers:
1:22-cr-00673-LAK-1
, 1:22-cr-00673-LAK-2
, 1:22-cr-00673-LAK-3
It also contains a docket entries table for each one.
Here is the original file: html-example.zip
I'm wondering if this is an error from PACER or something we need to support?
Woah. I've never seen this before, I don't think. I guess it'd be nice to support this, but until we know how it's even generated, I guess we can't really do it.
@johnhawkinson, have you ever seen a webpage like this in PACER, where it shows multiple dockets one after another after another? If so, do you know how it's created?
So you don't have to open Alberto's zip, the HTML is a single file with:
- Case 1 docket (metadata, parties, entries)
- Case 2 docket (metadata, parties, entries)
- Case 3 docket (metadata, parties, entries)
Each of the three cases are related to the same case, via doppelganger bug. Weird!
Not sure if this helps but when I pulled this docket on PACER, the docket heading looked slightly different depending on whether opening the 'All Defendants' docket sheet or the single defendant docket sheet.
See attached screencaps (underline added, obviously)
The 'All Defendants' string might be useful when parsing. Either that or the judge initials-defendant number string (e.g., LAK-1, LAK-2...). It might help the parser know when to check index length before writing data to the array.
Honestly, I'm inclined to just let this crash some more. I don't think we've gotten this error more than a few times ever.