juriscraper icon indicating copy to clipboard operation
juriscraper copied to clipboard

ACMS: Missing "Entered" Date Prevents Docket Creation

Open ERosendo opened this issue 10 months ago • 13 comments

A user reported that the extension wasn't creating a docket report record, even when attempted from different browsers.

The specific case is 24-5998: Fitzpatrick, et al. v. City of Los Angeles, et al from ca9. When I tried to upload this case, I encountered a processing queue error: We encountered a parsing error while processing this item: Docket entry's Entered: date should not be null.

Upon reviewing our code, specifically lines 358-361 in the acms_docket.py file, I identified the root cause of the error. Our parser is failing to extract the "Entered" date from the entry description.

https://github.com/freelawproject/juriscraper/blob/40f00ac3ebb1269af1335d8a730f0118b15b6c6c/juriscraper/pacer/acms_docket.py#L358-L361

The attached screenshot highlights the problematic entry, which is missing an "Entered" date.

Image

ERosendo avatar Jun 18 '25 23:06 ERosendo

Looks like an easy one. Should we do it now as part of the current push or push it to Ice Cream?

mlissner avatar Jun 19 '25 00:06 mlissner

Should we do it now as part of the current push or push it to Ice Cream?

I think we can do it in the current sprint. This issue also impacts another case.

I think the fix is to use one of the values in the raw data as a fallback. While debugging Courtlisterner - Issue#5731, I noticed ACMS uses a field called endDateFormatted to display the date in the docket report. Although this field is a date object rather than a datetime (providing less information), it would allow us to parse the docket report if used as a fallback.

@mlissner, let me know your thoughts.

ERosendo avatar Jun 23 '25 12:06 ERosendo

Sounds OK to me, I think, but endDateFormatted doesn't really sound like the same thing as date entered. How sure are you that it's correct?

mlissner avatar Jun 24 '25 04:06 mlissner

How sure are you that it's correct?

@mlissner I'm very confident. I reviewed the Vue component that ACMS uses to render the table in the docket report and can confirm it's pulling from the endDateFormatted field.

I've attached a screenshot of the component and the related code to show you what I mean. I added comments to the code to match the corresponding elements in the picture.

Image
<template>
...
  <table class="docket-entries">
    <tr>
      <td class="pa-0">
        <table>
          <tr v-for="item in sortedDocketEntries" :key="item.docketEntryId">
            <!-- First data cell: Date -->
            <td class="text-no-wrap">{{ item.endDateFormatted }}</td>

            <!-- Second data cell: Document number -->
            <td class="text-no-wrap">
              <!-- First variant of this cell:
                1. Has a checkbox
                2. Includes an anchor
                3. Small label that shows file size and number of pages
              -->
              <span v-if="documentsAreAccessible(item) && hasDocuments(item)">

                <input
                  type="checkbox"
                  class="document-selection"
                  v-model="item.selected"
                />

                <a
                  v-if="hasSingleDocument(item)"
                  href="javascript:void(0)"
                  class="entry-link"
                  @click="onSingleDocumentClick(item)"
                  >{{ item.entryNumber }}&nbsp;<img
                    v-if="isRestricted(item)"
                    border="2"
                    :src="restrictedDocIconUrlSrc"
                /></a>

                <a
                  v-else
                  href="javascript:void(0)"
                  class="entry-link"
                  @click="onMultiDocumentClick(item)"
                  >{{ item.entryNumber }}&nbsp;<img
                    v-if="isRestricted(item)"
                    border="2"
                    :src="restrictedDocIconUrlSrc"
                /></a>

                <p class="document-stats-label">
                  {{ item.pageCount }} pg. {{ item.fileSize }} KB
                </p>
              </span>
              <!-- Second variant of this cell:
                1.Just the entry number, no anchor
              -->
              <span v-else>
                {{ item.entryNumber }}
              </span>
            </td>

            <!-- Third data cell: Entry description -->
            <td v-html="item.docketEntryText" class="docket-entry-text"></td>
          </tr>
        </table>
      </td>
    </tr>
  </table>
...
</template>

As you can see, the first data cell in each table row is set to display {{ item.endDateFormatted }}. The code doesn't have any other logic for that cell, so we can be sure that's the field being used every time.

Let me know if you have any questions.

ERosendo avatar Jun 24 '25 04:06 ERosendo

Hm, can you look at entry 29 from this case? It has a different value for the first column's date and the date entered that's displayed at the end of the third column? If that holds up, I'm convinced!

mlissner avatar Jun 24 '25 04:06 mlissner

It has a different value for the first column's date and the date entered that's displayed at the end of the third column?

@mlissner You're totally right about the discrepancy in entry 29! 🤦

I went ahead and purchased the document. It looks like the date on the header matches the endDateFormatted, and the last page of the document also shows March 28. I've attached a screenshot of the document for reference.

Image

ERosendo avatar Jun 24 '25 05:06 ERosendo

So we agree that endDateFormatted is the date filed, not the date entered? If so, sounds like we can't get the date entered in some cases, right?

mlissner avatar Jun 24 '25 05:06 mlissner

So we agree that endDateFormatted is the date filed, not the date entered?

Yes, that sounds about right.

If so, sounds like we can't get the date entered in some cases, right?

That's correct. So far, we've identified two cases where the date entered isn't included in the docket entry description. There's no obvious pattern yet.

ERosendo avatar Jun 24 '25 06:06 ERosendo

So seems like wet just need to make sure the system works without that data always being available?

mlissner avatar Jun 24 '25 13:06 mlissner

So seems like wet just need to make sure the system works without that data always being available?

@mlissner I ran some experiments and tweaked the Juriscraper code to assign None to the date when parsing isn't possible. Here's the change:

            datetime_str = self._get_value(datetime_entered_regex, docket_text)
-           assert datetime_str != "", (
-               "Docket entry's Entered: date should not be null"
-           )
-           de["date_entered"] = convert_date_string(
-               datetime_str, datetime=True
-           )
+           de["date_entered"] = (
+              convert_date_string(datetime_str, datetime=True)
+              if datetime_str
+              else None
+           )

            # Unfortunately, the server expects a `date_filed`,
            # which we don't have, and probably can't get (it's in the
            # NDA though). Although the server also doesn't know that
            # other parsers send it `date_entered` in lieu of
            # `date_filed`, without being so explicit about it. Ugh!
            de["date_filed"] = de["date_entered"]

While this tweak works well for Juriscraper, I noticed an issue when testing it with CourtListener. Our code was filtering out docket entries that failed to parse the date. Upon reviewing the add_docket_entries function, I found that the first line of code is responsible for filtering these entries.

async def add_docket_entries(
    d: Docket,
    docket_entries: list[dict[str, Any]],
    tags: list[Tag] | None = None,
    do_not_update_existing: bool = False,
) -> tuple[
    tuple[list[DocketEntry], list[RECAPDocument]], list[RECAPDocument], bool
]:
    """
    ...
    """
    # Remove items without a date filed value.
    docket_entries = [de for de in docket_entries if de.get("date_filed")]

I've updated the add_docket_entries so we can now process entries even when there's no date. Here's how the docket report would look:

Image

Since the parser didn't provide a date for this specific entry, the recap_sequence_number wasn't computed. This causes the docket entry to appear first in the docket report, as the report relies on this field for ordering.

ERosendo avatar Jun 24 '25 17:06 ERosendo

I thought we still had the date filed, just not the date entered in this case? The docket sheet uses the date filed.

mlissner avatar Jun 24 '25 17:06 mlissner

I thought we still had the date filed, just not the date entered in this case?

oh yeah, we still have the date filed (endDateFormatted). Sorry, I misunderstood your previous message and thought you were suggesting we explore a scenario where our code gets an entry with no date information at all 😅.

For ACMS dockets, we store the 'date entered' in the database and discard other dates from the raw data. so just to confirm, do you think we should use the date filed (endDateFormatted) when we can't parse a date from the docket entry description ?

ERosendo avatar Jun 24 '25 17:06 ERosendo

I actually think it's a bug that we're putting the date entered values into our system's date filed field. Juriscraper should return both (one day we'll add a date entered field to our data model), but CL should be focused on the date filed, not the date entered, unless we had a reason to do otherwise. (We may have a reason. I'm really not sure. This feels like a bug to me though.)

mlissner avatar Jun 24 '25 18:06 mlissner