pdf-reader
pdf-reader copied to clipboard
Extra spaces between letters in a single word
I noticed this gem has problems parsing some PDFs where the text is not necessarily clean.
For instance, this file: https://www.jstor.org/stable/3684663
Some parts of it get output like: "a b o u t a r e g r e s s i o n t o o r i g i n a l c h a o s"
However, it doesn't seem like it's inherently a problem with the file, because Python's PyPDF2 interprets it correctly as "about a regression to original chaos"
Do you think there is some step that this reader is missing? Or alternatively is there some option I should set when using the PDF::Reader to get it to read the pdfs better?
I too am experiencing this issue.
same here.
I did some gsub. it works when the clustered word is in Pascal Case.
TheFirstWord = The First Word gsub(/([a-z])([A-Z])/, '\1 \2')
thefirstword = thefirstword ???