capstone
capstone copied to clipboard
Time traveling cites
Citation extraction reveals 194,902 cites that point to cases with later decision dates. These could be caused by a bunch of things, including:
- Date metadata errors
- Case text OCR errors
- Case extraction errors (e.g. identifying a "cases citing this case" section of the book as part of the case itself)
- Citation extraction errors (finding strings that look like citations but aren't)
- Actual time travel?
I'm not sure if there's any useful way to analyze these citations, but here they are if anyone can think of anything. For example we could group them by cases with lots of invalid outgoing links (likely either a date metadata problem or extraction error), or by cases with lots of invalid incoming links (likely a date metadata problem?).
Oh, this was produced with:
select
cite_from.frontend_url,
cite_from.id,
cite_from.decision_date_original,
ec.normalized_cite,
cite_to.frontend_url,
cite_to.id,
cite_to.decision_date_original
from capdb_casemetadata cite_from
inner join
capdb_extractedcitation ec on cite_from.id = ec.cited_by_id
inner join
capdb_citation cite on cite.normalized_cite = ec.normalized_cite
inner join
capdb_casemetadata cite_to on cite.case_id = cite_to.id
where cite_from.in_scope is true
and cite_from.decision_date_original < cite_to.decision_date_original;
So the columns in the csv are cite_from.frontend_url, cite_from.id, cite_from.decision_date_original, ec.normalized_cite, cite_to.frontend_url, cite_to.id, cite_to.decision_date_original.