nosia
nosia copied to clipboard
Improve PDF parsing
Actual behavior
I noticed that on some complex PDF, with tables, pdftotext
produce better result than pdf-reader
gem.
pdftotext: https://www.xpdfreader.com/pdftotext-man.html
Issue in Langchainrb: https://github.com/patterns-ai-core/langchainrb/issues/682
Expected behavior
Good results on complex PDF parsing.