eyecite
eyecite copied to clipboard
Discuss test data sets
Also, apart from running the included tests, do you have a test dataset you can recommend?
Originally posted by @step21 in https://github.com/freelawproject/eyecite/issues/86#issuecomment-901964272
(Just splitting this off here, so we can keep the other issue narrow.)
@step21, if you're just looking for some simple datasets to play with you could use the API at https://api.case.law or you can use the API at https://www.courtlistener.com/api/. The other thing is, if you are just experimenting for the sake of the JOSS review, you could just copy/paste some legal text and throw eyecite at it. For example, you could grab some text from here a recent SCOTUS opinion:
https://www.supremecourt.gov/opinions/slipopinion/20
Does that help?
Here's an example of extracting cites from all of the case.law cases for New Mexico, if it helps to have a larger dataset to play with:
# pip install eyecite requests
import shutil
import zipfile
import lzma
import json
import requests
from pathlib import Path
from eyecite import get_citations
# download data file (66MB) if not already downloaded
download_url = "https://case.law/download/bulk_exports/latest/by_jurisdiction/case_text_open/nm/nm_text.zip"
output_path = "nm_text.zip"
if not Path(output_path).exists():
print("Downloading to %s ..." % output_path)
with open(output_path, 'wb') as out_file:
shutil.copyfileobj(requests.get(download_url, stream=True).raw, out_file)
print("Done.")
# yield case texts from data file
def get_case_texts():
with zipfile.ZipFile(output_path, 'r') as zip_archive:
xz_path = next(path for path in zip_archive.namelist() if path.endswith('/data.jsonl.xz'))
with zip_archive.open(xz_path) as xz_archive, lzma.open(xz_archive) as jsonlines:
for line in jsonlines:
record = json.loads(str(line, 'utf-8'))
case_body = record['casebody']['data']
case_text = "\n".join([case_body['head_matter']]+[opinion['text'] for opinion in case_body['opinions']])
yield record['frontend_url'], case_text
# extract citations
for url, case_text in get_case_texts():
cites = get_citations(case_text)
print(url, [c.corrected_citation() for c in cites])
Thanks! It's doing things, so that's a good start. It was mostly for the review and I am mostly satisfied, but just to be sure I ran this anyway, and it got a key error. As this key is not in your code, it must be sth else...?
Downloading to nm_text.zip ...
Done.
https://cite.case.law/nmca/2013/039/4191100/ ['2013-NMCA-039', '107 N.M. 236', '755 P.2d 80', '2007-NMSC-002', '141 N.M. 21', '150 P.3d 971', '1998-NMSC-046', '126 N.M. 396', '970 P.2d 582', '2009-NMCA-081', '146 N.M. 717', '213 P.3d 1146', 'Id.', '2009-NMCA-015', '145 N.M. 533', '202 P.3d 126', '99 N.M. 302', '657 P.2d 629', '2010-NMCA-060', '148 N.M. 367', '237 P.3d 111', '2010-NMCA-085', '148 N.M. 627', '241 P.3d 628', '2010-NMCA-060', '534 U.S. 19', '111 N.M. 319', '805 P.2d 88', '2005-NMCA-061', '137 N.M. 420', '112 P.3d 281', '2000-NMCA-010', '128 N.M. 648', '996 P.2d 911', '1999-NMCA-011', '126 N.M. 460', '971 P.2d 851', '839 F. Supp. 80', '498 P.2d 1240', '2010-NMCA-060', '2008-NMSC-022', '143 N.M. 740', '182 P.3d 121', '2005-NMCA-061', '137 N.M. 420', '112 P.3d 281', '847 F.2d 435', '186 F.2d 683', '388 So. 2d 128', '2006-NMCA-015', '139 N.M. 48', '128 P.3d 476', '107 N.M. at 237', '755 P.2d at 81', 'Id.', '755 P.2d at 82', 'Id.', 'Id.', 'Id.', 'Id.', 'Id.', '755 P.2d at 84', 'Id.', 'Id.', '755 P.2d at 83', 'Id.', '755 P.2d at 84', 'Id.', 'Id.', 'Id.', 'Id.', '2007-NMSC-014', '141 N.M. 413', '156 P.3d 704', '2004-NMCA-136', '136 N.M. 658', '103 P.3d 582', '2010-NMSC-035', '148 N.M. 713', '242 P.3d 280', '115 N.M. 159', '848 P.2d 1086', '107 N.M. at 240', '755 P.2d at 84', '107 N.M. at 238', '755 P.2d at 82', '89 F.3d 1423', '956 F.2d 738', 'Id.', '543 P.2d 108', '106 N.M. 492', '745 P.2d 727', '143 N.M. 274', '175 P.3d 942', '2010-NMCA-052', '148 N.M. 277', '234 P.3d 929']
https://cite.case.law/nmca/2013/048/4190470/ ['2013-NMCA-048', '§§', '26 U.S.C. § 501', '§', '2003-NMSC-005', '133 N.M. 97', '61 P.3d 806', '2006-NMCA-095', '140 N.M. 198', '141 P.3d 542', '2008-NMCA-065', '144 N.M. 132', '184 P.3d 444', '2009-NMCA-009', '145 N.M. 494', '200 P.3d 544', '1999-NMCA-156', '128 N.M. 398', '993 P.2d 112', '1999-NMSC-021', '127 N.M. 120', '978 P.2d 327', '2010-NMCA-096', '148 N.M. 934', '242 P.3d 501', '1998-NMSC-050', '126 N.M. 413', '970 P.2d 599', '121 N.M. 764', '918 P.2d 350', '2006-NMSC-004', '139 N.M. 24', '127 P.3d 1111', '2009-NMSC-036', '146 N.M. 473', '212 P.3d 361', '§', '§', '93 N.M. 42', '596 P.2d 255', '2005-NMCA-029', '137 N.M. 103', '107 P.3d 543', '2009-NMCA-009', '2000-NMCA-074', '129 N.M. 413', '9 P.3d 657', '2001-NMCA-042', '130 N.M. 543', '28 P.3d 531']
https://cite.case.law/nmca/2012/116/4190761/ ['2012-NMCA-116', '2011-NMSC-014', '150 N.M. 84', '257 P.3d 904', 'Id.', '2011-NMSC-014', '411 U.S. 778', 'Id.', '408 U.S. 471', '2011-NMSC-014', '91 N.M. 749', '643 P.2d 618', '2011-NMSC-014', 'Id.', 'Id.', 'Id.', 'Id.', 'Id.', 'Id.', 'Id.', 'Id.', 'Id.', 'Id.', 'Id.', 'Id.', 'Id.', 'Id.', 'Id.', 'Id.', 'id.', 'Id.', '612 A.2d 288', 'Id.']
https://cite.case.law/nmca/2013/041/4190492/ ['2013-NMCA-041', '§§', '§]', '2012-NMSC-028', '285 P.3d 595', '2007-NMCA-098', '142 N.M. 319', '164 P.3d 1018', '2004-NMSC-010', '135 N.M. 397', '89 P.3d 69', '121 N.M. 764', '918 P.2d 350', '2009-NMSC-050', '147 N.M. 182', '218 P.3d 868', 'Id.', '2009-NMSC-049', '147 N.M. 177', '218 P.3d 863', '118 N.M. 234', '880 P.2d 845', '77 N.M. 742', '427 P.2d 258', '§', '§', '§§', '§', '§', '§', '§', '§', '§', '§', '§', '§', '§', '§', '§', '§', '§', '§', '§', '§', '§', '§', '§', '2009-NMCA-097', '147 N.M. 6', '216 P.3d 256', '2007-NMCA-069', '141 N.M. 686', '160 P.3d 595', '113 N.M. 231', '824 P.2d 1033', '113 N.M. at 236', '824 P.2d at 1038', '2005-NMCA-128', '138 N.M. 588', '124 P.3d 566', '2012-NMSC-026', '283 P.3d 853', '863 A.2d 976', '2010-NMCA-053', '148 N.M. 322', '236 P.3d 41', '106 N.M. 613', '747 P.2d 259', '2011-NMCA-016', '149 N.M. 420', '249 P.3d 1243']
https://cite.case.law/nmca/2013/025/4190553/ ['2013-NMCA-025', '§', '§', '2005-NMCA-120', '138 N.M. 466', '122 P.3d 50', '2006-NMCA-106', '140 N.M. 230', '141 P.3d 1284', '2004-NMCA-104', '136 N.M. 240', '96 P.3d 801', '39 Duq. L. Rev. 567', '§', '§', '§§', '§§', '§', '§', 'Id.', '2012-NMSC-029', '285 P.3d 622', 'Id.', 'Id.', 'Id.', '1999-NMCA-018', '126 N.M. 579', '973 P.2d 256', 'Id.', 'Id.', 'Id.', 'Id.', '906 P.2d 122', '717 N.E.2d 322', 'Ohio Rev. Code Ann. § 3103.04', 'Id.', '717 N.E.2d at 326', 'Id.', 'Id.', '44 Cal. Rptr. 330', '20 Cal. Rptr. 2d 582', 'Cal. Code § 5102', '44 Cal. Rptr. at 336', 'Id.', 'Id.', '268 Cal. Rptr. 501', 'Cal. Code § 5102', 'Id.', '268 Cal. Rptr. at 503', 'Id.', '2012-NMSC-029', '44 Cal. Rptr. at 336', '119 N.M. 609', '894 P.2d 386', '721 N.E.2d 73', '44 Cal. Rptr. at 333', '39 Duq. L. Rev. 567', '94 N.M. 706', '616 P.2d 419', '2011-NMSC-041', '150 N.M. 654', '265 P.3d 705', 'Id.', 'Id.', '94 N.M. at 708', '616 P.2d at 421', 'Id.', '§', '2012-NMCA-084', '284 P.3d 410', 'Id.', '1999-NMSC-001', '126 N.M. 438', '971 P.2d 829', '2012-NMCA-017', '2012-NMCERT-001', 'Id.', 'Id.', '2005-NMCA-045', '137 N.M. 339', '110 P.3d 1076', '1999-NMCA-152', '128 N.M. 345', '992 P.2d 896', 'id.', 'Id.', '2004-NMSC-019', '135 N.M. 621', '92 P.3d 633', 'Id.', '2005-NMSC-031', '138 N.M. 365', '120 P.3d 447', '2006-NMSC-001', '138 N.M. 700', '126 P.3d 516', '2009-NMSC-004', '145 N.M. 513', '201 P.3d 844', 'Id.', 'Id.', '94 N.M. 17', '606 P.2d 1111', '82 N.M. 333', '481 P.2d 412']
https://cite.case.law/nmca/2013/047/4191281/ ['2013-NMCA-047', '§', '§', '2004-NMCA-111', '136 N.M. 301', '97 P.3d 633', '2009-NMCA-110', '147 N.M. 127', '217 P.3d 613', '101 N.M. 694', '688 P.2d 12', 'Id.', '688 P.2d at 20', 'Id.', '688 P.2d at 15', '120 N.M. 734', '906 P.2d 266', 'Id.', '1999-NMCA-143', '128 N.M. 371', '993 P.2d 85', '1999-NMCA-143', '1999-NMCA-143', '2004-NMCA-111', '115 N.M. 710', '858 P.2d 86', 'Id.', '858 P.2d at 92', 'Id.', '858 P.2d at 92', 'Id.', 'Id.', '101 N.M. at 699', '688 P.2d at 17']
https://cite.case.law/nmca/2013/028/4190584/ ['2013-NMCA-028', '1997-NMSC-044', '123 N.M. 778', '945 P.2d 996', '2001-NMCA-094', '131 N.M. 195', '34 P.3d 139', '121 N.M. 38', '908 P.2d 731', 'Id.', '2000-NMCA-085', '129 N.M. 547', '10 P.3d 871', 'Id.', 'Id.', '2002-NMSC-007', '131 N.M. 758', '42 P.3d 1207', '117 N.M. 11', '868 P.2d 656', 'Id.', '§', '§', '80 N.M. 340', '455 P.2d 844', '121 N.M. at 44', '908 P.2d at 737', 'Id.', '2007-NMCA-035', '141 N.M. 328', '154 P.3d 703', '2000-NMSC-002', '128 N.M. 482', '994 P.2d 728', '2005-NMCA-010', '136 N.M. 723', '104 P.3d 1114', '§', '2007-NMCA-160', '143 N.M. 96', '173 P.3d 18', '2008-NMSC-048', '144 N.M. 663', '191 P.3d 521', '2009-NMSC-025', '146 N.M. 357', '210 P.3d 783', '112 N.M. 3', '810 P.2d 1223', '2007-NMSC-032', '142 N.M. 120', '164 P.3d 1', '112 N.M. at 13', '810 P.2d at 1233', '2008-NMSC-048', 'Id.', '2007-NMSC-032', 'Id.', '§', '2007-NMSC-032', 'Id.', '120 N.M. 486', '903 P.2d 228', '2007-NMSC-032', '119 N.M. 252', '889 P.2d 860', '2011-NMCA-121', '267 P.3d 820', '2012-NMCERT-008', '296 P.3d 491', '2007-NMSC-032', '2007-NMSC-032', '§', '2007-NMSC-032', 'Id.', 'Id.', 'Id.', '§', '§', '2003-NMCA-147', '134 N.M. 705', '82 P.3d 72', '§', '§', '§', '§', 'Kan. Stat. Ann. § 21-5408', '2006-NMSC-011', '131 P.3d 61', '§', '2007-NMSC-032', 'Id.', 'Id.', 'Id.', '§', '112 N.M. 554', '817 P.2d 1196', '2010-NMSC-020', '148 N.M. 381', '237 P.3d 683', '102 N.M. 274', '694 P.2d 922', '§', '§', '112 N.M. at 562', '817 P.2d at 1204', '2007-NMSC-032', 'Id.', '112 N.M. at 14', '810 P.2d at 1234', '949 A.2d 1092', '547 P.2d 720', '459 P.2d 225', '2012-NMCA-112', '289 P.3d 238', 'Id.', 'Id.', 'Id.', 'Id.', 'Id.', 'Id.', '2010-NMSC-005', '147 N.M. 557', '226 P.3d 656', '51 A.3d 970', '949 A.2d at 1121', '1999-NMCA-065', '127 N.M. 362', '981 P.2d 295', '2010-NMSC-005', '119 N.M. at 260', '889 P.2d at 868', '2012-NMCA-112', '2011-NMCA-018', '149 N.M. 294', '248 P.3d 336', '2011-NMCERT-001', '150 N.M. 559', '263 P.3d 901', '115 N.M. 6', '846 P.2d 312', '466 U.S. 668', '2011-NMCA-018', '115 N.M. at 17', '846 P.2d at 323', '2011-NMCA-018', '115 N.M. at 16', '846 P.2d at 322', '2002-NMSC-005', '131 N.M. 709', '42 P.3d 814', '2002-NMSC-027', '132 N.M. 657', '54 P.3d 61', 'id.', 'Id.', '2002-NMSC-027', 'Id.', 'Id.', 'Id.', '2006-NMCA-031', '139 N.M. 147', '130 P.3d 208', '2009-NMSC-018', '146 N.M. 142', '207 P.3d 1119', '2010-NMSC-041', '148 N.M. 747', '242 P.3d 314', '1998-NMCA-034', '124 N.M. 726', '955 P.2d 195', '1997-NMCA-117', '124 N.M. 261', '948 P.2d 1209', '2012-NMSC-008', '275 P.3d 110', '§', '2006-NMCA-110', '140 N.M. 356', '142 P.3d 944', '2006-NMCA-088', '140 N.M. 126', '140 P.3d 547', '98 N.M. 213', '647 P.2d 415', 'Id.', '1997-NMSC-004', '122 N.M. 794', '932 P.2d 484', '98 N.M. at 215', '647 P.2d at 417', 'id.', '2010-NMSC-041', 'Id.', '2010-NMSC-041', '2010-NMSC-041', '2001-NMCA-032', '130 N.M. 319', '24 P.3d 351', '2007-NMSC-057', '143 N.M. 7', '172 P.3d 144', '2006-NMCA-088', '2009-NMCA-102', '147 N.M. 26', '216 P.3d 276', '2000-NMSC-037', '130 N.M. 1', '15 P.3d 491', '2000-NMCA-033', '129 N.M. 47', '1 P.3d 429']
https://cite.case.law/nmca/2013/006/4191483/ ['2013-NMCA-006', '§§', '2011-NMSC-033', '150 N.M. 398', '259 P.3d 803', '2009-NMSC-021', '146 N.M. 256', '208 P.3d 901', 'Id.', '2011-NMSC-033', '2009-NMSC-021', '2013-NMCA-014', '293 P.3d 902', '2009-NMSC-021', 'Id.', 'Id.', 'Id.', 'Id.', 'Id.', 'Id.', 'id.', 'Id.', '2013-NMCA-014', 'Id.', 'Id.', '42 C.F.R. § 483.12', '2011-NMSC-033', '2009-NMSC-021', '2011-NMSC-033', '2009-NMSC-021', '2013-NMCA-014', '2013-NMCA-014', '2013-NMCA-014']
https://cite.case.law/nm/37/48/ ['125 P. 609', '126 Okla. 114', '258 P. 863', 'supra.', '47 Kan. 283', '27 P. 997', '141 Mass. 74', '6 N.E. 757', '§', '132 Mich. 395', '93 N.W. 914', '59 Minn. 111', '60 N.W. 1081', '191 P. 460', '§', '§', '199 P. 373', '14 Cal. App. 250', '111 P. 631', 'supra,', '68 W. Va. 493', '70 S.E. 119']
https://cite.case.law/nm/37/222/ ['202 P. 687', 'supra,', '219 P. 794', '§', '58 P. 393', '88 S.W. 363', '115 Wis. 317', '91 N.W. 107', '79 Wis. 546', '48 N.W. 653', '180 Wis. 577', '193 N.W. 353', '234 P. 311']
https://cite.case.law/nm/37/212/ ['§', '28 Stat. 278', '33 Stat. 811', '§', '§', '236 F. 340', '255 F. 683', '288 F. 187', 'supra,', '236 F. 342', 'supra,', 'supra,', '132 S.E. 800', '81 S.E. 418', '135 P. 553', 'supra,', 'supra,', '116 F. 145', '41 Ind. App. 620', '84 N.E. 555']
https://cite.case.law/nm/37/597/ ['194 P. 862']
https://cite.case.law/nm/37/478/ ['§', '295 P. 424', '218 P. 787', '§']
https://cite.case.law/nm/37/474/ ['§', '§', '236 P. 735', 'supra,', '247 P. 270']
https://cite.case.law/nm/37/101/ ['89 P. 259']
https://cite.case.law/nm/37/312/ ['246 P. 910', '299 P. 1008']
https://cite.case.law/nm/37/91/ ['§', '256 P. 179', 'supra.', '§', 'supra.', '240 P. 469', '298 P. 410', '290 P. 793', '222 P. 912', '256 P. 179', '76 Cal. 624', '18 P. 686', '287 P. 290', '147 P. 916', '249 P. 108', '85 P. 393', '§', '136 F. 168', '69 C.C.A. 80', '49 Ala. 567', '65 Colo. 258', '176 P. 302', '17 Ill. App. 30', '67 F. 384', '106 Wis. 387', '82 N.W. 302', '62 Minn. 498', '65 N.W. 84', '124 Cal. 568', '57 P. 561', '34 Cal. App. 272', '167 P. 299']
https://cite.case.law/nm/37/559/ ['221 Mo. App. 85', '290 S.W. 96', '162 Mo. App. 408', '142 S.W. 757', '178 S.W. 52', '69 Mo. App. 1']
https://cite.case.law/nm/37/226/ []
https://cite.case.law/nm/37/600/ ['§', '22 Cal. 191', '§', '287 P. 64', '44 A. 161', '59 A. 565']
---------------------------------------------------------------------------
KeyError Traceback (most recent call last)
<ipython-input-3-7520a1175d19> in <module>()
32 for url, case_text in get_case_texts():
33 cites = get_citations(case_text)
---> 34 print(url, [c.corrected_citation() for c in cites])
1 frames
/usr/local/lib/python3.7/dist-packages/eyecite/models.py in corrected_citation(self)
200 if self.edition_guess:
201 return self.matched_text().replace(
--> 202 self.groups["reporter"], self.edition_guess.short_name
203 )
204 return self.matched_text()
KeyError: 'reporter'
Looks like models.py at line around 201 needs some guard code to ensure the "reporter" key is present. Something along the lines of
if self.edition_guess:
if "reporter" in self.groups:
return self.matched_text().replace(self.groups["reporter"], self.edition_guess.short_name)
return self.matched_text()
I'd have to look closer at what's calling that section of code to see what assumptions that breaks though. But would you like me to work on this and get a patch in?
Yeah, seems like a good one to fix. Worth yanking into its own issue though, if you don't mind.