Kevin Ramirez
Kevin Ramirez
Add correct regex to identify it as nominative reporter
If you try to open a csv file named all.csv you will get the following error: 
This new command will allow us to import opinions from manually collected files, originally intended for this issue: https://github.com/freelawproject/courtlistener/issues/1958 With the data already extracted from the nhd PDFs, we can...
- This change implements with_best_text() in opinion_tabs_content.html to optimize opinion text retrieval. - It also adds an extra value named _original_text_source_ when using with_best_text() because in some cases we need...
This problem was found when working on the parent issue I was able to identify 2660 opinions with incorrect author_str values, in some cases the text is incorrect, in other...
Due to django-storages, the library no longer falls back to `AWS_ACCESS_KEY_ID` or `AWS_SECRET_ACCESS_KEY` when empty strings are provided in settings for dev environment resulting in this error: `An error occurred...
We need a Django management command that allows us to import court opinions collected manually or semi-automatically (e.g. via local runs of Juriscraper). This will serve as an intermediate solution...
- Improve extension and MIME extraction when Magika fails on certain files - Add magic and other fallbacks to handle tricky formats - Strip metadata first to avoid detection bias...
A valid PDF ([2025_33502.pdf](https://nycourts.gov/reporter/pdfs/2025/2025_33502.pdf)) was misclassified as an Adobe Illustrator (ai) file. The file opens normally and starts with %PDF-1.6. ``` head -c 500 /home/quevon24/PycharmProjects/pythonProjects/2025_33502.pdf|xxd 00000000: 2550 4446 2d31 2e36...
Text extraction microservice fails on some PDFs because pdftotext rejects them with errors like: ``` Syntax Error: Couldn't find trailer dictionary Syntax Error: Couldn't read xref table ``` These PDFs...