olmocr
olmocr copied to clipboard
Question: Function `process_pdf` only read PDF from s3?
t seems that the process_pdf method in pipeline.py only processes PDFs from S3 storage, but the README.md file says I can specify one or more local PDFs using the --pdfs parameter. So, where is the code to process local PDFs?
I'm wondering the same thing. Why can't this leverage local PDFs without having to host them in S3 storage?
Yes, I encountered the same problem. After reading the source code, I found that I could only use the PDF on the cloud. like a shit.......
Newer versions have supported local files for a while now.