William Palin
William Palin
@mlissner This PR is meant to improve the extraction of text from PDFs by using a few additional simple rules to decide if text is extracted appropriately. Those rules include...
``` doctor | File "", line 241, in _call_with_frames_removed doctor | File "/opt/app/doctor/urls.py", line 3, in doctor | from . import views doctor | File "/opt/app/doctor/views.py", line 34, in doctor...
Unfortunately, we are experiencing occasional crashes on CL from the inability to extract HTML from certain HTML files. These occur mostly around HTML files downloads from the NY lower courts,...
Need to add timeouts to tests in Doctor. We had timeouts that were too short in CL and we would've known this if we had added timeouts to Doctor Tests.
Example: 71A A.F.T.R.2d (RIA) 3011 fails citation parsing because volume must be a digit.
See: Williams v. IRS, 2007-2 U.S. Tax Cas. (CCH) P50,568 (E.D. Mo. 2007)
Full citation not understood. Still investigating but a number of citations have (page?) See examples: ``` Metzler v. Arcadian Corp. 1997 OSHD (CCH) P31,311 CCH OSHD P 20,091 (1975) ```
See examples: ``` 2015 0667 (La.App. 1 Cir. 02/04/16); Court of Appeal of Louisiana, First Circuit 2011 2269 (La.App. 1 Cir. 11/29/12); Court of Appeal of Louisiana, First Circuit 2007...
See Example: RAVELERS INDEM. CO. v. HYLTON, 1972 U.S. Dist. LEXIS 12735 ``` 1972 Auto. Cas. (CCH) P7530 ``` This is slightly different than the other format that has PXX,...
See: ``` McCahon 166 Armstrong v. Wyandotte Bridge Co., McCahon 166 Dallam 614 Allen v. Scott, Dallam 614 ```