pdf-reader
pdf-reader copied to clipboard
Page.text fails when font size changes on a single line
When reading text from a document that uses different font sizes on the same line of text, I have seen that fail both as extra spaces and overridden characters. I am wondering is this something that pdf-reader is intended to do accurately?
Example file: "hello_world_caps.pdf"
Example spec (fails):
describe "#text" do
...
it "can deal with different height characters on the same line" do
@browser = PDF::Reader.new(pdf_spec_file("hello_world_caps"))
@page = @browser.page(1)
expect(@page.text).to eql("HELLO WORLD") # Returns "HELLWORLD"
end
end
Thanks for a great sample file that demonstrates the issue.
I am wondering is this something that pdf-reader is intended to do accurately?
I would classify it as a known issue that I'd like to handle better than we currently do. Probably the algorithm in PageLayout needs a significant overhaul, which is a bummer.