Update Citation model's full span and regexes to account for ReferenceCitation overlaps
With the introduction of ReferenceCitations we noticed they sometimes overlapped with other citation models.
Given that References may be a standalone name As seen in Roe, ... or a name pincite combination As seen in Roe at 223, a reference extraction that does not take into account other citation models may incorrectly extract references that are actually part of the fuller citation models.
Currently, this is managed by eyecite.helpers.filter_citations, but we have been running into bugs due to not having correct full span calculations; or due to having incomplete extractors
overlap with supra
From Example 1
- overlap with supra citation
Twombly, supra, at 553-554
A Reference would be found inside of the Supra due to incomplete full span calculation: https://github.com/freelawproject/eyecite/blob/32ee7566aa079d7285560bdf3e77557740a5fa63/eyecite/find.py#L313-L324
overlap with short case citation
From Example 1
- overlap with ShortCaseCitation
Twombly, 550 U. S.( I think this has been solved recently)
overlap with single-name and pincite full case citation
Example 2:
-
Nobelman at 332, 113 S.Ct. 2106is actually a pincited case citation (?); currently we would identify it as a Reference followed by: a full citation or maybe a short case citation
overlap with single name full case citation
From example 1
Not strictly related to References, but to parallel citations; this should probably be split into another issue; but I am pointing it here to be added as test cases that we will know will fail
-
State v. Howard, supra 128-129, 539 A.2d 1203.is a single citation that lists all the parallels, but our system will recognize it as a SupraCitation followed by a CaseCitation
On the same example, something similar happens with an IdCitation and parallel citations
I added a logger.error for unknown overlap types; this is bringing in some clues on new citation formats
New overlap type: IdCitation with FullCaseCitation; when the correct citation type would be a FullCaseIdCitation
From this opinion:
...are material. See id. at 248, 106 S. Ct. 2505. A dispute...
The key to reading this is the FullCaseCitation.metadata.defendant. That field is only populated by helpers.add_defendant. In this case, it is finding a stopword in the "See" token before the "id". I think the whole string should be a single citation; but we don't support that with our current model
[
FullCaseCitation('106 S.Ct. 2505', groups={'volume': '106', 'reporter': 'S.Ct.', 'page': '2505'}, metadata=FullCaseCitation.Metadata(parenthetical=None, pin_cite=None, year=None, court='scotus', plaintiff=None, defendant='id. at 248', extra=None, antecedent_guess=None, resolved_case_name_short=None, resolved_case_name=None)),
IdCitation('id.', metadata=IdCitation.Metadata(parenthetical=None, pin_cite='at 248')),
]
Another overlap on the same opinion, a FullCaseCitation with a ShortCaseCitation; in this case, they are actually a parallel citation. Again, the behavior comes from helpers.add_defendant
was pretextual. See McDonnell Douglas, 411 U.S. at 804, 93 S. Ct. 1817
FullCaseCitation('93 S.Ct. 1817', groups={'volume': '93', 'reporter': 'S.Ct.', 'page': '1817'}, metadata=FullCaseCitation.Metadata(parenthetical=None, pin_cite=None, year=None, court='scotus', plaintiff=None, defendant='McDonnell Douglas, 411 U.S. at 804', extra=None, antecedent_guess=None, resolved_case_name_short=None, resolved_case_name=None)).
ShortCaseCitation('411 U.S. at 804', groups={'volume': '411', 'reporter': 'U.S.', 'page': '804'}, metadata=ShortCaseCitation.Metadata(parenthetical=None, pin_cite='804', year=None, court='scotus', antecedent_guess='Douglas'))