juriscraper icon indicating copy to clipboard operation
juriscraper copied to clipboard

docket_report.html fails on In Re: cases (sometimes)

Open johnhawkinson opened this issue 4 years ago • 0 comments

Some cases with "In Re:" -style dockets do not properly get parsed, because the case name is in the (nominal) parties section of the the HTML. For instance, this HTML:

Screen Shot 2020-12-07 at 22 09 21
<table border=0 cellspacing=0 width="100%">
					<tr><td align=right><SPAN >CLOSED</SPAN></td></tr>
				</table>
			<h3 align=center>United States District Court<BR>
Eastern District of Wisconsin (Milwaukee)<BR>
CIVIL DOCKET FOR CASE #: 2:20-mc-00047-PP</h3>
<TABLE width='100%' border=0 CELLSPACING=5><tr>
<td valign=top width="60%"><br>Assigned to: Chief Judge Pamela Pepper</td>
<td valign=top width="40%"><br>Date Filed: 12/04/2020<br>
Date Terminated: 12/04/2020</td>
</tr></table>
<table width="100%" border=0 cellspacing=5>
				<tr>
					<td><b><u>IN RE:                                  </u></b></td>
				</tr>
			
			<tr>
				<td valign=top width="40%">
					<B>Application for  Multi-Court Exemption from the Judicial Conference&#039;s Electronic Public Access Fees</B>
		</td>
</tr><tr><td></td></tr>

				<tr>
					<td><b><u>Movant                                  </u></b></td>
				</tr>
			
			<tr>
				<td valign=top width="40%">
					<B>Rebecca L Fordon</B>
		</td>
<td valign=top width="20%" align=right>represented&nbsp;by</td><td valign=top width="40%"><B>Rebecca L Fordon</B>
<BR>385 Charles E Young Dr, E
<BR>1106 Law Bldg
<BR>Los Angeles, CA 90095
<BR>PRO SE</td></tr><tr><td></td></tr>
</table>

produces this JSON:

{ u'assigned_to_str': u'Pamela Pepper',
  u'case_name': u'Unknown Case Title',
  u'cause': u'',
  u'court_id': u'cand',
  u'date_converted': None,
  u'date_discharged': None,
  u'date_filed': datetime.date(2020, 12, 4),
  u'date_terminated': datetime.date(2020, 12, 4),
  u'demand': u'',
  u'docket_entries': [],
  u'docket_number': u'2:20-mc-00047',
  u'jurisdiction': u'',
  u'jury_demand': u'',
  u'mdl_status': u'',
  u'nature_of_suit': u'',
  u'parties': [ { u'date_terminated': None,
                  u'extra_info': u'',
                  u'name': u"Application for  Multi-Court Exemption from the Judicial Conference's Electronic Public Access Fees",
                  u'type': u'In Re:'},
                { u'attorneys': [ { u'contact': u'385 Charles E Young Dr, E\n1106 Law Bldg\nLos Angeles, CA 90095\nPRO SE\n',
                                    u'name': u'Rebecca L Fordon',
                                    u'roles': []}],
                  u'date_terminated': None,
                  u'extra_info': u'',
                  u'name': u'Rebecca L Fordon',
                  u'type': u'Movant'}],
  u'referred_to_str': u''}

because the case caption is buried down in the parties section.

Now, you could make the argument that this is not the real caption and that arguably there is no caption, given both that you need to have Parties and Counsel checked to get this and because the iquery screen gives you this:

Screen Shot 2020-12-07 at 22 10 39

I don't buy that argument and I think parsing this is better than leaving it as "Unknown Case Title."

However, I really despise the DocketReport parser class because of its lack of comments showing what it is doing with the XML, and its bizarre circumlocutions and subsidiary functions that do weird non-obvious things so you have to single-step into them to see what is going on. I blew an hour trying to figure out structurally how to fix this and gave up.

This was a helpful incantation, though, to get you started on pdb:

(juriscraper) pb3:juriscraper jhawk$ cat > mpdb.py <<EOF
import pdb
import runpy
import sys


def main():
    module = sys.argv[1]
    sys.argv[1:] = sys.argv[2:]
    pdb.runcall(runpy.run_module, module, run_name='__main__')


__name__ == '__main__' and main()
EOF
(juriscraper) pb3:juriscraper jhawk$ python -m mpdb juriscraper.pacer.docket_report  wied-fordon.html

I think the answer is to note the special party type of "In Re:" and concatenate that with the next item. Maybe this is like how the adversary proceeding code works, but…it is too inscrutable for me to tell in the time allotted.

Sorry.

johnhawkinson avatar Dec 08 '20 03:12 johnhawkinson