capstone icon indicating copy to clipboard operation
capstone copied to clipboard

Time traveling cites

Open jcushman opened this issue 4 years ago • 1 comments

Citation extraction reveals 194,902 cites that point to cases with later decision dates. These could be caused by a bunch of things, including:

  • Date metadata errors
  • Case text OCR errors
  • Case extraction errors (e.g. identifying a "cases citing this case" section of the book as part of the case itself)
  • Citation extraction errors (finding strings that look like citations but aren't)
  • Actual time travel?

I'm not sure if there's any useful way to analyze these citations, but here they are if anyone can think of anything. For example we could group them by cases with lots of invalid outgoing links (likely either a date metadata problem or extraction error), or by cases with lots of invalid incoming links (likely a date metadata problem?).

time_travel_cites.csv.zip

jcushman avatar Apr 28 '20 13:04 jcushman

Oh, this was produced with:

select
    cite_from.frontend_url, 
    cite_from.id, 
    cite_from.decision_date_original, 
    ec.normalized_cite, 
    cite_to.frontend_url, 
    cite_to.id, 
    cite_to.decision_date_original
from capdb_casemetadata cite_from
inner join
    capdb_extractedcitation ec on cite_from.id = ec.cited_by_id
inner join
    capdb_citation cite on cite.normalized_cite = ec.normalized_cite
inner join
    capdb_casemetadata cite_to on cite.case_id = cite_to.id
where cite_from.in_scope is true
    and cite_from.decision_date_original < cite_to.decision_date_original;

So the columns in the csv are cite_from.frontend_url, cite_from.id, cite_from.decision_date_original, ec.normalized_cite, cite_to.frontend_url, cite_to.id, cite_to.decision_date_original.

jcushman avatar Apr 29 '20 00:04 jcushman