convenience methods - eg. extract named destinations
I would like to extract a list of all named destinations in a pdf file to directly navigate to the position in the command line of the pdf-reader (e.g. HREF="http://www.example.com/myfile.pdf#glossary).
pypdf has a convenient method for this (https://unix.stackexchange.com/questions/246622/list-named-destinations-in-a-pdf)
could this be added to pdf-reader as well?
I'd be more than happy to see a convenience method for named destinations added.
I probably don't have time to add it myself, but I'm happy to review a PR.
If you could give me a hint how I can access the named destination, I will propose a convenience method.
Up to now I did not find out, how I can get the named destination.
Am 30.01.2020 um 12:39 schrieb James Healy <[email protected] mailto:[email protected]>:
I'd be more than happy to see a convenience method for named destinations added.
I probably don't have time to add it myself, but I'm happy to review a PR.
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/yob/pdf-reader/issues/325?email_source=notifications&email_token=AAPN4QFRGIF25IVLLF3K6QLRAK36ZA5CNFSM4KNESXF2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEKKVYCA#issuecomment-580213768, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAPN4QCEWV3RN565KNHMFY3RAK36ZANCNFSM4KNESXFQ.
Thanks for offering the contribute!
The implementation in pypdf shows some helpful clues: https://github.com/mstamy2/PyPDF2/blob/18a2627adac13124d4122c8b92aaa863ccfb8c29/PyPDF2/pdf.py#L1350-L1389
By coincidence, this spec file in the pdf-reader repo has some named destinations: spec/data/pdflatex.pdf.
This code fragment demonstrates the general approach to replicating the pypdf code in pdf-reader:
diff --git a/lib/pdf/reader.rb b/lib/pdf/reader.rb
index 0ac514b..6d7e830 100644
--- a/lib/pdf/reader.rb
+++ b/lib/pdf/reader.rb
@@ -206,6 +206,17 @@ module PDF
PDF::Reader::Page.new(@objects, num, :cache => @cache)
end
+ def named_destinations
+ names = root[:Names]
+ return {} if names.nil?
+
+ dests = @objects.deref(names)[:Dests]
+ return {} if dests.nil?
+
+ @objects.deref(dests)
+ end
+
+
private
In terms of specs, I'd love to see a single new spec in spec/integration_spec.rb that confirms the full output of the method for spec/data/pdflatex.pdf. Maybe something roughly like this:
$ git diff spec/integration_spec.rb
diff --git a/spec/integration_spec.rb b/spec/integration_spec.rb
index 446373e..8ee51f1 100644
--- a/spec/integration_spec.rb
+++ b/spec/integration_spec.rb
@@ -1168,4 +1168,16 @@ describe PDF::Reader, "integration specs" do
end
end
end
+
+ context "extracts named destinations" do
+ let(:filename) { pdf_spec_file("pdflatex") }
+
+ it "extracts text correctly" do
+ PDF::Reader.open(filename) do |reader|
+ expect(page.named_destinations).to eq({
+ :Foo => "Bar"
+ })
+ end
+ end
+ end
end
fine, I will do my best ...
Hi, I started to implement this - even if I don't exactly know what I am doing :-) I more or less ported the pypdf method. I have two questions:
-
the pypdf method retrieves all named destinations. So shouldn't named_destinations be a method of Reader?
-
I could not find out how I can get the text representing the destination. In pdf-reader there is no equivalent to the class
Destinationavailable in pypdf. So I do not really know what to return. -
pdflatex.pdf holds about 90 destinations. Wouldn't it be sufficient to expect the count of destinations and the details of one particular entry?
I started to implement this
great!
the pypdf method retrieves all named destinations. So shouldn't named_destinations be a method of Reader?
Yes. I'm not fully across named destinations, but my understanding is they're a document-level concept and not page level, so I think the method should go on the PDF::Reader class.
I could not find out how I can get the text representing the destination. In pdf-reader there is no equivalent to the class Destination available in pypdf. So I do not really know what to return.
hmm. I'm not familiar enough with destinations to know the answer off the top of my head. I'd suggest opening a PR with as much as you can get, and then hopefullly I can help fill in the gaps.
pdflatex.pdf holds about 90 destinations. Wouldn't it be sufficient to expect the count of destinations and the details of one particular entry?
Yes, your suggestion sounds fine.
Yes. I'm not fully across named destinations, but my understanding is they're a document-level concept and not page level, so I think the method should go on the
PDF::Readerclass.
I have added the method to both classes. I also have opened the PR.