juriscraper
juriscraper copied to clipboard
DocketReport: handle 'Date Entered' option
DocketReport: handle 'Date Entered' option
A PACER docket report can be run with either 'Date Entered' or 'Date Filed' dates. Previous code assumed, roughly, "A date is a date," but that's not so.
It's common for the date entered to lag the date filed by a day, and
then that means different people who run the docket report with
different values for that radio button would cause different dates to
appear in the CL date_filed
field.
So here we check the column heading of the first column against several known choices, or we throw an exception if it doesn't make sense.
We broaden the CELL_XPATH expression to include <th>
as well as <td>
,
since in a few BK cases that's required.
We also normalize the column heading whitespace with a helper function normalize_whitespace() that arguably could live elsewhere.
There's some ambiguity about what 'Docket Date' means (is it filing or
entering?), but since it only appears historically, we'll place it
where we have always done so -- as the date_filed
.
This needs freelawproject/courtlistener#840 which adds support for
the new date_entered
field.
Can you pull out the pep8 stuff in a separate PR? I'm afraid I'll miss something with it all mixed together like this.
It turns out I missed something important: regardless of which variety of the report is run, there is a parenthesized note at the end that gives the Date Entered:
data:image/s3,"s3://crabby-images/806b0/806b00ba61fb0c94cd76d8d876814a0ec838bb51" alt="screen shot 2018-06-08 at 17 25 46"
So this code should probably [attempt to?] parse that, too.
@johnhawkinson is this worth finishing?
This is worth doing, yes, but I don't know where we'll prioritize it. If John does it, we'll merge it, but I don't think it's high on our list otherwise.