pdf-reader
pdf-reader copied to clipboard
page_count undefined method `[]' for nil:NilClass
In v1.3, v1.2, v1.0, when I run the code to iterate through all pages:
pdf_textfile = File.open('aero_text.txt', 'w') reader.pages.each do |page| pdf_textfile << page.text end pdf_textfile.close
I get the output:
gems/pdf-reader-1.0.0/lib/pdf/reader.rb:138:in page_count': undefined method
[]' for nil:NilClass (NoMethodError)
from /Users/sands/.rvm/gems/ruby-1.9.2-p320@newdev/gems/pdf-reader-1.0.0/lib/pdf/reader.rb:224:in pages' from pdf2text.rb:11:in
This refers to the pages[] hash being nil for some reason, in reader.rb:
def page_count
pages = @objects.deref(root[:Pages])
@page_count ||= pages[:Count]
end
The reader initializes on the pdf file correctly because I can call reader.version and it reports back fine, but getting to the page level (on OS/X 10.8.2) simply doesn't work for this PDF, and no clues as to why are provided by the error message.
Cheers,
Sands Fish
Thanks for the report.
To understand the cause I'd really have to see the problem PDF. Are you able to share it with me via email ([email protected]'d.au)? On 18/01/2013 6:18 AM, "Sands Fish" [email protected] wrote:
In v1.3, v1.2, v1.0, when I run the code to iterate through all pages:
pdf_textfile = File.open('aero_text.txt', 'w') reader.pages.each do |page| pdf_textfile << page.text end pdf_textfile.close
I get the output:
gems/pdf-reader-1.0.0/lib/pdf/reader.rb:138:in page_count': undefined method[]' for nil:NilClass (NoMethodError) from /Users/sands/.rvm/gems/ruby-1.9.2-p320@newdev/gems/pdf-reader-1.0.0/lib/pdf/reader.rb:224:in pages' from pdf2text.rb:11:in'
This refers to the pages[] hash being nil for some reason, in reader.rb:
def page_count pages = @objects.deref(root[:Pages]) @page_count ||= pages[:Count] end
The reader initializes on the pdf file correctly because I can call reader.version and it reports back fine, but getting to the page level (on OS/X 10.8.2) simply doesn't work for this PDF, and no clues as to why are provided by the error message.
Cheers,
Sands Fish
— Reply to this email directly or view it on GitHubhttps://github.com/yob/pdf-reader/issues/76.
James, does your email address have a single-quote character in it? Doesn't like it in GMail. Will send the PDF once I can.
-S
Damn you autocorrect. My address is [email protected] On 20/01/2013 12:57 AM, "Sands Fish" [email protected] wrote:
James, does your email address have a single-quote character in it? Doesn't like it in GMail. Will send the PDF once I can.
-S
— Reply to this email directly or view it on GitHubhttps://github.com/yob/pdf-reader/issues/76#issuecomment-12464136.
Thanks for the file. If we can discover the underlying issue I'll manually create a new file for a test case and delete your sample.
When I use the pdf_text binary to try and trigger the same issue you're getting, I see a different exception.
⚡ pdf_text foo.pdf
/home/jh/.gem/ruby/1.9.1/gems/pdf-reader-1.3.0/lib/pdf/reader/filter/flate.rb:34:in `rescue in filter': Error occured hile inflating a compressed stream (Zlib::DataError: invalid distance too far back) (PDF::Reader::MalformedPDFError)
from /home/jh/.gem/ruby/1.9.1/gems/pdf-reader-1.3.0/lib/pdf/reader/filter/flate.rb:17:in `filter'
from /home/jh/.gem/ruby/1.9.1/gems/pdf-reader-1.3.0/lib/pdf/reader/stream.rb:63:in `block in unfiltered_data'
from /home/jh/.gem/ruby/1.9.1/gems/pdf-reader-1.3.0/lib/pdf/reader/stream.rb:62:in `each'
from /home/jh/.gem/ruby/1.9.1/gems/pdf-reader-1.3.0/lib/pdf/reader/stream.rb:62:in `each_with_index'
from /home/jh/.gem/ruby/1.9.1/gems/pdf-reader-1.3.0/lib/pdf/reader/stream.rb:62:in `unfiltered_data'
from /home/jh/.gem/ruby/1.9.1/gems/pdf-reader-1.3.0/lib/pdf/reader/object_stream.rb:11:in `initialize'
from /home/jh/.gem/ruby/1.9.1/gems/pdf-reader-1.3.0/lib/pdf/reader/object_hash.rb:86:in `new'
from /home/jh/.gem/ruby/1.9.1/gems/pdf-reader-1.3.0/lib/pdf/reader/object_hash.rb:86:in `[]'
from /home/jh/.gem/ruby/1.9.1/gems/pdf-reader-1.3.0/lib/pdf/reader/object_hash.rb:97:in `object'
from /home/jh/.gem/ruby/1.9.1/gems/pdf-reader-1.3.0/lib/pdf/reader.rb:138:in `page_count'
from /home/jh/.gem/ruby/1.9.1/gems/pdf-reader-1.3.0/lib/pdf/reader.rb:225:in `pages'
from /home/jh/.gem/ruby/1.9.1/gems/pdf-reader-1.3.0/bin/pdf_text:11:in `<top (required)>'
from /home/jh/.gem/ruby/1.9.1/bin/pdf_text:23:in `load'
from /home/jh/.gem/ruby/1.9.1/bin/pdf_text:23:in `<main>'
Do you get anything like that or always the nil exception? What version of ruby are you running?
James, always this one. Version info below...
sands$ *ruby pdf2text.rb aeronautics-gravity-reducing-propulsion.pdf *
PDF Version : 1.6
/gems/pdf-reader-1.0.0/lib/pdf/reader.rb:138:in page_count': undefined method
[]' for nil:NilClass (NoMethodError)
from /Users/sands/.rvm/gems/ruby-1.9.2-p320@newdev/gems/pdf-reader-1.0.0/lib/pdf/reader.rb:224:in
pages' from pdf2text.rb:11:in
sands$* ruby -v* ruby 1.9.2p320 (2012-04-20 revision 35421) [x86_64-darwin12.2.0]
sands$ gem list |grep pdf pdf-reader (1.0.0)
On Sun, Jan 20, 2013 at 6:22 AM, James Healy [email protected]:
Thanks for the file. If we can discover the underlying issue I'll manually create a new file for a test case and delete your sample.
When I use the pdf_text binary to try and trigger the same issue you're getting, I see a different exception.
⚡ pdf_text aeronautics-gravity-reducing-propulsion.pdf /home/jh/.gem/ruby/1.9.1/gems/pdf-reader-1.3.0/lib/pdf/reader/filter/flate.rb:34:in
rescue in filter': Error occured hile inflating a compressed stream (Zlib::DataError: invalid distance too far back) (PDF::Reader::MalformedPDFError) from /home/jh/.gem/ruby/1.9.1/gems/pdf-reader-1.3.0/lib/pdf/reader/filter/flate.rb:17:in
filter' from /home/jh/.gem/ruby/1.9.1/gems/pdf-reader-1.3.0/lib/pdf/reader/stream.rb:63:inblock in unfiltered_data' from /home/jh/.gem/ruby/1.9.1/gems/pdf-reader-1.3.0/lib/pdf/reader/stream.rb:62:in
each' from /home/jh/.gem/ruby/1.9.1/gems/pdf-reader-1.3.0/lib/pdf/reader/stream.rb:62:ineach_with_index' from /home/jh/.gem/ruby/1.9.1/gems/pdf-reader-1.3.0/lib/pdf/reader/stream.rb:62:in
unfiltered_data' from /home/jh/.gem/ruby/1.9.1/gems/pdf-reader-1.3.0/lib/pdf/reader/object_stream.rb:11:ininitialize' from /home/jh/.gem/ruby/1.9.1/gems/pdf-reader-1.3.0/lib/pdf/reader/object_hash.rb:86:in
new' from /home/jh/.gem/ruby/1.9.1/gems/pdf-reader-1.3.0/lib/pdf/reader/object_hash.rb:86:in[]' from /home/jh/.gem/ruby/1.9.1/gems/pdf-reader-1.3.0/lib/pdf/reader/object_hash.rb:97:in
object' from /home/jh/.gem/ruby/1.9.1/gems/pdf-reader-1.3.0/lib/pdf/reader.rb:138:inpage_count' from /home/jh/.gem/ruby/1.9.1/gems/pdf-reader-1.3.0/lib/pdf/reader.rb:225:in
pages' from /home/jh/.gem/ruby/1.9.1/gems/pdf-reader-1.3.0/bin/pdf_text:11:in<top (required)>' from /home/jh/.gem/ruby/1.9.1/bin/pdf_text:23:in
load' from /home/jh/.gem/ruby/1.9.1/bin/pdf_text:23:in `' Do you get anything like that or always the nil exception? What version of ruby are you running?
— Reply to this email directly or view it on GitHubhttps://github.com/yob/pdf-reader/issues/76#issuecomment-12469100.
Can you paste the contents of pdf2text.rb?
note that i'm not clear on how to access the page content for the
aggregation i'm attempting, but it errors out before it gets there, so it's moot for now
require 'pdf-reader'
reader = PDF::Reader.new("aeronautics-gravity-reducing-propulsion.pdf")
puts "PDF Version : #{reader.pdf_version}"
pdf_textfile = File.open('aero_text.txt', 'w')
reader.pages.each do |page|
pdf_textfile << page.text # or page.raw_content ?
end
pdf_textfile.close
On Tue, Jan 22, 2013 at 4:32 AM, James Healy [email protected]:
Can you paste the contents of pdf2text.rb?
— Reply to this email directly or view it on GitHubhttps://github.com/yob/pdf-reader/issues/76#issuecomment-12536550.
Unfortunately I can't reproduce this error on my system, so I can't fix it. I'll leave the ticket open in case I have a flash of inspiration.
sorry!
Ah, that's too bad. Maybe I can find another system to attempt it on and rule out a part of the stack that might be at fault. On Feb 25, 2013 6:08 AM, "James Healy" [email protected] wrote:
Unfortunately I can't reproduce this error on my system, so I can't fix it. I'll leave the ticket open in case I have a flash of inspiration.
sorry!
— Reply to this email directly or view it on GitHubhttps://github.com/yob/pdf-reader/issues/76#issuecomment-14035972.