courtlistener
courtlistener copied to clipboard
Parsing other forms of docket number
This follows on to issue #1272 as discussed on Slack - when a search by docket number is done in the recap archive and it's not in the standard ECF form (for instance if I enter 21-1234 instead of 1:21-cv-1234-ABC) it doesn't find the docket.
Hm, this is a tricky one because you're giving us a docket number without the letters in it and asking us to match up on ones that have the letters in it. So when you search for:
21-1234
That doesn't match (in a technical sense) the docket number:
1:21-cv-1234
It's like asking for the word turkey to match the word eagle. They're similar, but different.
Now, we do have what I call the "docket_number_core" in the DB. In this case, I think it'd be 211234
, so we could, in theory, be matching up against that. But that's in the DB, not in search, so we can't match that up like you'd hope unless we added it to the search index.
I guess we could take all the queries of the form dd-dddd and explode them into a query like:
dd-cv-dddd OR dd-cr-dddd OR dd-mj-dddd OR dd-dddd
I guess that'd work, but it's pretty nasty. Or I guess we could do a proximity phrase search instead. Something like:
docketNumber:"21-551"~2
That'd allow any single term to go between the two numbers. It's a solution, I suppose, and perhaps it's what the user is after?
What do our logs of failed searches look like?
I think we can worry about the 2 common cases of civil and criminal and not worry too much about or Magistrate Judge cases or Miscellaneous Business Docket cases or MDLs or Petty Offenses &c. &c., so I think it would be ok to break it into dd-cv-dddd OR dd-cr-dddd
. On the other hand, if the proximity thing works, that sounds like the way to go!
As for analogies, I think its more like asking for "turkey" to match the word "turnkey." :)
I'm not good enough at our logs to figure that out, but I suppose it's theoretically possible.
Glad you found a better analogy. I was hoping somebody would.
Yeah, John hit the nail on the head. bk and ap might also be there for bankruptcy. But I'd guess the majority of searches are cv, with cr as second and the rest as marginal cases.