pdfjs icon indicating copy to clipboard operation
pdfjs copied to clipboard

Table of Contents?

Open BlitzInternet opened this issue 3 years ago • 7 comments

Is there any way to create a table of contents with this lib? It looks great and I'd hate to choose another one cause this is impossible, but from the docs I can't figure out how it would work.

BlitzInternet avatar Aug 05 '20 13:08 BlitzInternet

There is no feature that automatically creates a TOC for you. That means you would have to know all your headlines in advance and build it manually. You could use the destination and goTo text options to link a TOC entry to a headline. Here is an example of using destination and goTo https://github.com/rkusa/pdfjs/blob/master/test/pdfs/annotations/text-destination.js

rkusa avatar Aug 06 '20 06:08 rkusa

Thanks for the help. Is there a way to get the page number of where the target headline is? Keeping count of how often pageBreak was called would be trivial, but I think most page breaks happen automatically while using large tables and the like.

BlitzInternet avatar Aug 06 '20 07:08 BlitzInternet

Oops, I totally forgot about the most important part of a TOC while answering your question ... the page numbers. It is, unfortunately, indeed not possible to properly keep track of the page numbers since pdfjs does not pre-calculate the layout for the whole PDF beforehand and instead writes out the PDF as soon as basically the next line is calculated. That being said, to properly support TOCs, a bigger change would be necessary. So I am afraid that pdfjs does not solve your use-case for now and I don't think that I'll have time for adding a TOCs in the near future, but I'd still keep the issue open, since it is definitely a valid feature request.

rkusa avatar Aug 06 '20 10:08 rkusa

There is a way I could see this work without changing the basic workings of your lib: the toc needs to be written last. Then it can either be on the last page, or on a different PDF that is then combined with the content-pdf.

In order for that to work, you would need to emit events during the writing process that an app can listen to. Content of the event: current element, current page, current position. It'd be possible to create an index from that and then create the ToC. Do you think this could be implemented?

BlitzInternet avatar Aug 12 '20 10:08 BlitzInternet

the toc needs to be written last. Then it can either be on the last page, or on a different PDF that is then combined with the content-pdf.

It is actually quite easy to write the TOC page last, but move it to the beginning of the document.

In order for that to work, you would need to emit events during the writing process that an app can listen to. Content of the event: current element, current page, current position. It'd be possible to create an index from that and then create the ToC. Do you think this could be implemented?

I was thinking about the same 👍. The main challenge that makes it less straight forward here is that some parts of the document are written into chunks and those chunks are sometimes moved to the next page after the content was already written to it. Example: Each table row is a chunk. The content of all row cells is written to the chunk line by line. Once a page break is encountered, it checks how many lines were already written and whether it makes sense to move the chunk to the next page or not (to basically prefer moving rows to the next page instead of starting rows at the end of a page for only one or two lines). In this case, the event for the lines that were already written, was containing a wrong page number, since the chunk got moved to the next page.

Also to consider, the whole approach would only work if the page numbers start after the TOC - but I think this should be fine for a lot of use cases.

rkusa avatar Aug 12 '20 11:08 rkusa

Could you not send the event after that potential moving has occured? If it can include start and end position, that would just be a bonus.

As for the page numbers: I imagine I would define a simple offset when creating the toc based on the index. It should be relatively easy to guess if the ToC is one page or longer.

BlitzInternet avatar Aug 12 '20 11:08 BlitzInternet

Could you not send the event after that potential moving has occured? If it can include start and end position, that would just be a bonus.

Yes possible. That would just require keeping track of the items that need an update once it is sure where they end up. I am afraid that this is still more work than I currently have the time for 😕

rkusa avatar Aug 12 '20 12:08 rkusa