juriscraper icon indicating copy to clipboard operation
juriscraper copied to clipboard

ParserError: day is out of range for month: 0

Open sentry-io[bot] opened this issue 1 year ago • 6 comments

This could be an anomaly, but we got 6k events like this in the last 24 hours, with no recent deploys that'd make sense, I think. Could be somebody messing around with uploads, I'm not sure.

Sentry Issue: COURTLISTENER-4WX

_RemoteTraceback: 
"""
Traceback (most recent call last):
  File "/usr/local/lib/python3.11/site-packages/dateutil/parser/_parser.py", line 649, in parse
    ret = self._build_naive(res, default)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/dateutil/parser/_parser.py", line 1235, in _build_naive
    naive = default.replace(**repl)
            ^^^^^^^^^^^^^^^^^^^^^^^
ValueError: day is out of range for month

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/usr/local/lib/python3.11/concurrent/futures/process.py", line 261, in _process_worker
    r = call_item.fn(*call_item.args, **call_item.kwargs)
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/courtlistener/cl/recap/tasks.py", line 488, in parse_docket_text
    return report.data
           ^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/juriscraper/pacer/docket_report.py", line 388, in data
    return super().data
           ^^^^...
ParserError: day is out of range for month: 0
(9 additional frame(s) were not displayed)
...
  File "concurrent/futures/_base.py", line 449, in result
    return self.__get_result()
  File "concurrent/futures/_base.py", line 401, in __get_result
    raise self._exception
  File "cl/recap/views.py", line 56, in perform_create
    await process_recap_upload(pq)
  File "cl/recap/tasks.py", line 110, in process_recap_upload
    docket = await process_recap_docket(pq.pk)
  File "cl/recap/tasks.py", line 526, in process_recap_docket
    data = await asyncio.get_running_loop().run_in_executor(

sentry-io[bot] avatar Oct 10 '23 22:10 sentry-io[bot]

Any idea what docket is triggering this?

ttys0dev avatar Oct 11 '23 00:10 ttys0dev

No, none. I guess we'll need to do a data export to see on this one. If you want to take it on, say so, and @albertisfu can help grab some samples?

mlissner avatar Oct 11 '23 00:10 mlissner

sure, can take a quick look if I can see a sample

ttys0dev avatar Oct 11 '23 00:10 ttys0dev

Sure, here are some examples extracted from S3 after checking the related ProcessingQueues.

These two are from cod, we've had hundreds, if not thousands, of errors and processing queues for these particular uploads in the last day.

9e835d1e06af4bd489197e6eb786960b.txt 5f597b318c504f7aa4f05f6a26497542.txt

And some other examples from other days and courts:

miwd 99b04a47f09844b8a57416d9ab9ea0bb.txt

cacd 39e9fd495f964b0c83118bda10a51336.txt

nysd b1737882acc84d2ab4983d203e87b460.txt

Let me know if you need additional details.

albertisfu avatar Oct 11 '23 14:10 albertisfu

9e835d1e06af4bd489197e6eb786960b.txt 5f597b318c504f7aa4f05f6a26497542.txt

Tables structures appear to be corrupt on these...not sure we can do much here.

miwd 99b04a47f09844b8a57416d9ab9ea0bb.txt

cacd 39e9fd495f964b0c83118bda10a51336.txt

nysd b1737882acc84d2ab4983d203e87b460.txt

These are strange, the all have two dockets one right after another.

ttys0dev avatar Oct 12 '23 22:10 ttys0dev

Hm, OK. Alberto, let's put this on your post-elastic-search list and maybe you can look and decide if there's anything we can do. Weird that we got thousands of bad dockets in so little time.

mlissner avatar Oct 13 '23 20:10 mlissner