fpdf2 icon indicating copy to clipboard operation
fpdf2 copied to clipboard

Feature request: table of contents generation without requiring to know in advance how many pages it will span

Open alallier opened this issue 4 years ago • 12 comments

With this exception

fpdf.errors.FPDFException: The rendering function passed to FPDF.insert_toc_placeholder triggered to many page breaks: page 9 was reached while it was expected to span only 1 pages

Although I checked the code here and it doesn't seem it should fail

alallier avatar Apr 24 '21 01:04 alallier

I was looking at the code and I see in the placeholder arguments you can specify how many pages it will span. This seems to work although you need to know your page count ahead of time which is impossible to do when running against dynamic datasets.

alallier avatar Apr 24 '21 02:04 alallier

Yeah, this a current limitation of this implementation.

I think I see how I can make this fully dynamic now, I'm going to have a look at it.

Lucas-C avatar Apr 24 '21 08:04 Lucas-C

Without knowing in advance how many page the ToC will span, we will have to shift page numbers according to its size once rendered.

The challenge will then be to increment all references to page numbers:

  • in self.pages : dict containing pages and metadata
  • in self.annots : link & text annotations
  • in self.links : internal links inside the document
  • in self.struct_builder.doc_struct_elem.k (all page.id)
  • in self._outline

I fear this will introduce quite some code, and lower code readability...

As a workaround for now, you can always generate your documents with increasing values of pages passed to .insert_toc_placeholder, and stop when you do not have raised any exception. It's an ugly approach, but it should work I think.

Lucas-C avatar Apr 24 '21 08:04 Lucas-C

A workaround would be to estimate the # of pages if the estimate is off one can use the # of pages the error returns to set the correct number. It allows for a max of 2 iteration

jwinkel13 avatar Mar 11 '22 12:03 jwinkel13

If there is still any interest in this topic: Did an implementation of that. A little bit hacky though:

  1. create the ToC at the end of the pdf -- use placeholders for the current page-numbers (like number of pages -> {nb})
  2. reorder the pages and put the ToC in place
  3. fix the links
  4. replace the placeholders -- with the same limitations as the replacement of {nb}

I think that this could well be included in the current implementation

yaminle avatar May 11 '23 08:05 yaminle

If there is still any interest in this topic: Did an implementation of that. A little bit hacky though:

I think that this could well be included in the current implementation

Hi @yaminle! Thanks for the feedback 😊

Would you like to contribute a PR regarding this?

Else, if that is more investment than you wish, could you maybe share the code you used? On GitHub or elsewhere

Lucas-C avatar May 11 '23 15:05 Lucas-C

Hi @yaminle! Thanks for the feedback 😊

Would you like to contribute a PR regarding this?

Else, if that is more investment than you wish, could you maybe share the code you used? On GitHub or elsewhere

Well - I would really like to. I'll do my best. I've got to study the guidelines first though ;)

yaminle avatar May 11 '23 16:05 yaminle

Well - I would really like to. I'll do my best. I've got to study the guidelines first though ;)

Great! Take your time and please ask any you question you may have 😊

Lucas-C avatar May 11 '23 18:05 Lucas-C

Hi, has this been implemented yet? I fond a "fix" for now, but it is really not perfect 😅.

Benoite142 avatar Sep 20 '24 13:09 Benoite142

@Benoite142 what's your fix?

alallier avatar Sep 20 '24 14:09 alallier

@Benoite142 what's your fix?

Hi! Pretty much what I do is I counted (by hand) how much lines there was in the first page of the TOC and counted once again for the following pages (46 for the first page and 48 for the followings) . -- I then check if the size of my list is smaller than 46 -- If not, I do a ceil of the size minus the number of lines of first page divided by the number of lines of the other pages (48), and all of that +1 (for the first page).

I then pass the value I get in insert_toc_placeholder.

So pretty much it is:

if sizeOf(data) <=nbOfLinesPage1
   pages =1
else
   pages = ceil(( sizeOf(data) - nbOfLinesPage1) / nbOfLinesElse) +1

So it is really not perfect since I can change some visual stuff and need to recount of much lines I get on each pages for the calculation to work again.

Benoite142 avatar Sep 20 '24 14:09 Benoite142

Hi,

the idea is to generate the TOC at the end of the document and reorder the pages - which is possible. The correct page numbers are then inserted instead of placeholders. Quite similar to the current implementation of the total number of pages. Unfortunately I am very busy at the moment - I'd really like to implement it actually in the source.

Let's see ...

Cheers, Armin

On Fri, Sep 20, 2024 at 4:25 PM Benoit Charbonneau @.***> wrote:

@Benoite142 https://github.com/Benoite142 what's your fix?

Hi! Pretty much what I do is I counted (by hand) how much lines there was in the first page of the TOC and counted once again for the following pages (46 for the first page and 48 for the followings) . -- I then check if the size of my list is smaller than 46 -- If not, I do a ceil of the size minus the number of lines of first page divided by the number of lines of the other pages (48), and all of that +1 (for the first page).

I then pass the value I get in insert_toc_placeholder.

So pretty much it is:

if sizeOf(data) <=nbOfLinesPage1 pages =1 else pages = ceil(( sizeOf(data) - nbOfLinesPage1) / nbOfLinesElse) +1

So it is really not perfect since I can change some visual stuff and need to recount of much lines I get on each pages for the calculation to work again.

— Reply to this email directly, view it on GitHub https://github.com/py-pdf/fpdf2/issues/136#issuecomment-2363862946, or unsubscribe https://github.com/notifications/unsubscribe-auth/AF3WNQLJC73E3PJIXUZHJDTZXQV5PAVCNFSM6AAAAABOSC7SR6VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDGNRTHA3DEOJUGY . You are receiving this because you were mentioned.Message ID: @.***>

yaminle avatar Sep 20 '24 15:09 yaminle

A workaround would be to estimate the # of pages if the estimate is off one can use the # of pages the error returns to set the correct number. It allows for a max of 2 iteration

Hi, how can you do that without the 'A placeholder for the table of contents has already been defined' message appearing? I was able to get the number of pages from the first error message, but I get this other error message because I have already used insert_toc_placeholer call.

Benoite142 avatar Nov 01 '24 14:11 Benoite142

Hey @Benoite142 @alallier

I am working on the PR #1188 and I just created a reference implementation of Table of Contents that can handle adding extra pages, would you be available to take a look on it and check if it works for you and if you have any suggestion for improvement before I move to merge this PR?

To install the version from my branch you can do:

pip uninstall fpdf2
pip install git+https://github.com/andersonhc/fpdf2.git@page-number

The documentation is here: https://github.com/andersonhc/fpdf2/blob/page-number/docs/DocumentOutlineAndTableOfContents.md#reference-implementation

You can also check the test I created here: https://github.com/andersonhc/fpdf2/blob/891f0c2cdbe32c2b347b097073b621e4cd51fc17/test/outline/test_outline.py#L427 https://github.com/andersonhc/fpdf2/blob/page-number/test/outline/toc_with_extra_page_0.pdf https://github.com/andersonhc/fpdf2/blob/page-number/test/outline/toc_with_extra_page_1.pdf https://github.com/andersonhc/fpdf2/blob/page-number/test/outline/toc_with_extra_page_2.pdf

andersonhc avatar Nov 07 '24 03:11 andersonhc

Hey @andersonhc,

Wow! This looks promising!

I'll try to find time today to look at it and test the changes for the toc placeholder pages.

Thanks for reaching out!

Benoite142 avatar Nov 07 '24 13:11 Benoite142

Hey @andersonhc ,

Sorry for taking so long to try it out,

But I just tried it and get very good results with it! I've used it on my big pdf generator and a small test aside and I didn't get any issue with it. Good job!

Only thing I am seeing is that, the page number that I have in my footer doesn't always correctly display the right pages for some reason. On my 29 page pdf, I get the right page for page 1 and then I get 27, 28, 29 for the rest of the TOC pages, and then 2,3, ... for the content that is not in the TOC. I also show the corresponding pages for the content of the TOC with a link and the page is also wrong since the other pages from the TOC (2,3,4 but noted as 27,28,29) are in the wrong position.

But overall, great fix, I never encountered the fpdf.errors.FPDFException: The rendering function passed to FPDF.insert_toc_placeholder triggered to many page breaks: page 9 was reached while it was expected to span only 1 pages exception and that was my big problem.

Just the page number issue which can be tricky, but good job none the less. 😁

Benoite142 avatar Nov 14 '24 14:11 Benoite142

The reference implementation of Table of Contents made by @andersonhc has been released in fpdf2 2.8.2: https://py-pdf.github.io/fpdf2/DocumentOutlineAndTableOfContents.html#reference-implementation

Only thing I am seeing is that, the page number that I have in my footer doesn't always correctly display the right pages for some reason. On my 29 page pdf, I get the right page for page 1 and then I get 27, 28, 29 for the rest of the TOC pages, and then 2,3, ... for the content that is not in the TOC. I also show the corresponding pages for the content of the TOC with a link and the page is also wrong since the other pages from the TOC (2,3,4 but noted as 27,28,29) are in the wrong position.

If you want @Benoite142 you could give us a minimal reproducible example of this annoying case, and we would be happy to get a look at it! 🙂

I think that we could close this issue?

Lucas-C avatar Jan 08 '25 12:01 Lucas-C

Sure, I'll try doing that tomorrow. The issue might be fixed since I didn't try the code since I last commented.

And yes, you can close the issue. 😃

Benoite142 avatar Jan 08 '25 19:01 Benoite142

@Lucas-C thank you for all of your interaction in this thread over the past four years. I read the linked documentation you sent and it does in fact seem like you have solved what I originally opened this issue for. Unfortunately I no longer have my test case handy to test but based on the description of the changes it seems like it would work.

I might recommend linking the PR where the fix was implemented so future on lookers will have the full context before closing though. I agree with @Benoite142 I think it's safe to close the issue.

What @Benoite142 was discussing about footer numbers even seems to be covered by the note in the linked documentation, regardless as he stated that's probably a new issue anyways.

Thanks to everyone who contributed over the past few years on this!

alallier avatar Jan 08 '25 20:01 alallier

What @Benoite142 is referring to is that as soon as the ToC spans more than 1 page the page numbers for the actual content are incorrect. The content always starts at 2.

Based on the documentation, it seems that page labels can help with that.

mschoettle avatar Jan 09 '25 15:01 mschoettle

I created #1343 for the page number issue.

mschoettle avatar Jan 09 '25 16:01 mschoettle