eyecite icon indicating copy to clipboard operation
eyecite copied to clipboard

Citation parser fails for statutes with letters in the section number

Open jmesserschmidt1 opened this issue 2 years ago • 3 comments

U.S. code statutes with letters in them appear to be unrecognized. So "18 U.S.C. § 1028" and "18 U.S.C. § 1028(a)" are parsed, but "18 U.S.C. § 1028A" is not. I've tried some variations, but seems to be consistent.

jmesserschmidt1 avatar Mar 13 '23 20:03 jmesserschmidt1

Thanks for sending this along. I think this would be pretty easy to fix, but our code parsers aren't particularly advanced compared to our opinions parsers.

Do you want to take a stab at it?

mlissner avatar Mar 13 '23 20:03 mlissner

Thanks for sending this along. I think this would be pretty easy to fix, but our code parsers aren't particularly advanced compared to our opinions parsers.

Do you want to take a stab at it?

Sure. Not super familiar with the code, but suspect might need a variation on the law_section regex similar to the one that exists for page or volume, like here. This comes up with CFR cites as well (e.g., 17 CFR § 240.10b-5 is currently parsed as 17 CFR 240). So something like (?P<section>\\d+(?:[\\-.:]\\d+){,3})[a-zA-Z]{0,4}) and (?P<section>\\d+(?:[\\-.:]\\d+){,3})[a-zA-Z]{0,4})

jmesserschmidt1 avatar Mar 14 '23 04:03 jmesserschmidt1

I don't know that part of the code very well either, but if you want to do a PR with tests that fixes this, I think we'd probably merge it (and release a new version, if desired).

mlissner avatar Mar 14 '23 14:03 mlissner