pdf-reader
pdf-reader copied to clipboard
Unable to extract text from pdf/a (with flat decode)
Hi ! I have tried this pdf, with this code:
require 'rubygems'
require 'pdf/reader'
filename = "pdfa.pdf"
PDF::Reader.open(filename) do |reader|
reader.pages.each do |page|
puts page.text
end
end
But the result was something like:
Is there any way to extract text from it?
I get the same results when trying to extract text using pdf-reader.
I also tried extracting text with pdftotext (which uses libpoppler), and firefox (which uses pdf.js). Neither of them worked either.
I haven't checked the PDF contents in detail, but I'm if poppler and pdf.js have trouble then I suspect it's a broken file.