ACMS: Missing "Entered" Date Prevents Docket Creation
A user reported that the extension wasn't creating a docket report record, even when attempted from different browsers.
The specific case is 24-5998: Fitzpatrick, et al. v. City of Los Angeles, et al from ca9. When I tried to upload this case, I encountered a processing queue error: We encountered a parsing error while processing this item: Docket entry's Entered: date should not be null.
Upon reviewing our code, specifically lines 358-361 in the acms_docket.py file, I identified the root cause of the error. Our parser is failing to extract the "Entered" date from the entry description.
https://github.com/freelawproject/juriscraper/blob/40f00ac3ebb1269af1335d8a730f0118b15b6c6c/juriscraper/pacer/acms_docket.py#L358-L361
The attached screenshot highlights the problematic entry, which is missing an "Entered" date.
Looks like an easy one. Should we do it now as part of the current push or push it to Ice Cream?
Should we do it now as part of the current push or push it to Ice Cream?
I think we can do it in the current sprint. This issue also impacts another case.
I think the fix is to use one of the values in the raw data as a fallback. While debugging Courtlisterner - Issue#5731, I noticed ACMS uses a field called endDateFormatted to display the date in the docket report. Although this field is a date object rather than a datetime (providing less information), it would allow us to parse the docket report if used as a fallback.
@mlissner, let me know your thoughts.
Sounds OK to me, I think, but endDateFormatted doesn't really sound like the same thing as date entered. How sure are you that it's correct?
How sure are you that it's correct?
@mlissner I'm very confident. I reviewed the Vue component that ACMS uses to render the table in the docket report and can confirm it's pulling from the endDateFormatted field.
I've attached a screenshot of the component and the related code to show you what I mean. I added comments to the code to match the corresponding elements in the picture.
<template>
...
<table class="docket-entries">
<tr>
<td class="pa-0">
<table>
<tr v-for="item in sortedDocketEntries" :key="item.docketEntryId">
<!-- First data cell: Date -->
<td class="text-no-wrap">{{ item.endDateFormatted }}</td>
<!-- Second data cell: Document number -->
<td class="text-no-wrap">
<!-- First variant of this cell:
1. Has a checkbox
2. Includes an anchor
3. Small label that shows file size and number of pages
-->
<span v-if="documentsAreAccessible(item) && hasDocuments(item)">
<input
type="checkbox"
class="document-selection"
v-model="item.selected"
/>
<a
v-if="hasSingleDocument(item)"
href="javascript:void(0)"
class="entry-link"
@click="onSingleDocumentClick(item)"
>{{ item.entryNumber }} <img
v-if="isRestricted(item)"
border="2"
:src="restrictedDocIconUrlSrc"
/></a>
<a
v-else
href="javascript:void(0)"
class="entry-link"
@click="onMultiDocumentClick(item)"
>{{ item.entryNumber }} <img
v-if="isRestricted(item)"
border="2"
:src="restrictedDocIconUrlSrc"
/></a>
<p class="document-stats-label">
{{ item.pageCount }} pg. {{ item.fileSize }} KB
</p>
</span>
<!-- Second variant of this cell:
1.Just the entry number, no anchor
-->
<span v-else>
{{ item.entryNumber }}
</span>
</td>
<!-- Third data cell: Entry description -->
<td v-html="item.docketEntryText" class="docket-entry-text"></td>
</tr>
</table>
</td>
</tr>
</table>
...
</template>
As you can see, the first data cell in each table row is set to display {{ item.endDateFormatted }}. The code doesn't have any other logic for that cell, so we can be sure that's the field being used every time.
Let me know if you have any questions.
Hm, can you look at entry 29 from this case? It has a different value for the first column's date and the date entered that's displayed at the end of the third column? If that holds up, I'm convinced!
It has a different value for the first column's date and the date entered that's displayed at the end of the third column?
@mlissner You're totally right about the discrepancy in entry 29! 🤦
I went ahead and purchased the document. It looks like the date on the header matches the endDateFormatted, and the last page of the document also shows March 28. I've attached a screenshot of the document for reference.
So we agree that endDateFormatted is the date filed, not the date entered? If so, sounds like we can't get the date entered in some cases, right?
So we agree that
endDateFormattedis the date filed, not the date entered?
Yes, that sounds about right.
If so, sounds like we can't get the date entered in some cases, right?
That's correct. So far, we've identified two cases where the date entered isn't included in the docket entry description. There's no obvious pattern yet.
So seems like wet just need to make sure the system works without that data always being available?
So seems like wet just need to make sure the system works without that data always being available?
@mlissner I ran some experiments and tweaked the Juriscraper code to assign None to the date when parsing isn't possible. Here's the change:
datetime_str = self._get_value(datetime_entered_regex, docket_text)
- assert datetime_str != "", (
- "Docket entry's Entered: date should not be null"
- )
- de["date_entered"] = convert_date_string(
- datetime_str, datetime=True
- )
+ de["date_entered"] = (
+ convert_date_string(datetime_str, datetime=True)
+ if datetime_str
+ else None
+ )
# Unfortunately, the server expects a `date_filed`,
# which we don't have, and probably can't get (it's in the
# NDA though). Although the server also doesn't know that
# other parsers send it `date_entered` in lieu of
# `date_filed`, without being so explicit about it. Ugh!
de["date_filed"] = de["date_entered"]
While this tweak works well for Juriscraper, I noticed an issue when testing it with CourtListener. Our code was filtering out docket entries that failed to parse the date. Upon reviewing the add_docket_entries function, I found that the first line of code is responsible for filtering these entries.
async def add_docket_entries(
d: Docket,
docket_entries: list[dict[str, Any]],
tags: list[Tag] | None = None,
do_not_update_existing: bool = False,
) -> tuple[
tuple[list[DocketEntry], list[RECAPDocument]], list[RECAPDocument], bool
]:
"""
...
"""
# Remove items without a date filed value.
docket_entries = [de for de in docket_entries if de.get("date_filed")]
I've updated the add_docket_entries so we can now process entries even when there's no date. Here's how the docket report would look:
Since the parser didn't provide a date for this specific entry, the recap_sequence_number wasn't computed. This causes the docket entry to appear first in the docket report, as the report relies on this field for ordering.
I thought we still had the date filed, just not the date entered in this case? The docket sheet uses the date filed.
I thought we still had the date filed, just not the date entered in this case?
oh yeah, we still have the date filed (endDateFormatted). Sorry, I misunderstood your previous message and thought you were suggesting we explore a scenario where our code gets an entry with no date information at all 😅.
For ACMS dockets, we store the 'date entered' in the database and discard other dates from the raw data. so just to confirm, do you think we should use the date filed (endDateFormatted) when we can't parse a date from the docket entry description ?
I actually think it's a bug that we're putting the date entered values into our system's date filed field. Juriscraper should return both (one day we'll add a date entered field to our data model), but CL should be focused on the date filed, not the date entered, unless we had a reason to do otherwise. (We may have a reason. I'm really not sure. This feels like a bug to me though.)