react-pdf icon indicating copy to clipboard operation
react-pdf copied to clipboard

Table of Contents

Open skjo0c opened this issue 6 years ago • 15 comments

Is your feature request related to a problem? Please describe. I have a large set of data that will generate a pdf so I am looking for a way to generate TOC automatically.

There was similar kind of functionality in pdfmake that uses pdfkit as well. I was looking for similar kind of thing with react-pdf.

skjo0c avatar Feb 07 '19 07:02 skjo0c

Hi, What did you mean by "generate TOC automatically"?

Assuming that you are asking by some kind of component that automatic generate a page with the ToC, I think the react-pdf is a low level lib to create pdf in a declarativel way and this kind of component can be create using the build blocks already provided. As a prove of concept see the project attached (read App.js and toc,js)

Would be wonderful if the community developed something like twitter bootstrap for this library, with commonly used components. But unfortunately I do not have time to start a project like these :(

react-pd-test-toc.zip

dsvictor94 avatar Jun 19 '19 22:06 dsvictor94

This approach does not handle the page numbers in the ToC. You'd still need a third pass to get those as the page numbers cannot be detected before the second finishes, unless you want to have the ToC at the end of the produced PDF.

Latex is doing three renderings to get numbering correctly.

vstirbu avatar Jun 20 '19 06:06 vstirbu

Thanks for that info @vstirbu ! Could you refer me to where I can see in more detail how latex does this?

diegomura avatar Jun 21 '19 18:06 diegomura

I totally forgot about the page numbering :disappointed:

I was thinking about it and came with the ideia of a "document event" that is emitted after layout but before the painting, and allow to any listener of this event call setState reacting to layout variables (like the page number and the bounding rect).

this can depreciate the dynamic render (aka render prop) and allow much more complex features. But I don't think this is a easy thing to implement, because it will envolve:

  • a document tree diff before painting
  • if something change, detect what need a style recompute or layout recompute
  • re-dispatch the event and do everything again (potential infinity loop?)

There is some performance implication too, but I think is impossible to beat the numbering problem and others (e.g. ensure chapter start on even pages) without a performance penalty. And this strategy will move the performance considerations to the user (e.g. avoiding change things that cause re-layout) when they consider this a problem.

dsvictor94 avatar Jun 21 '19 21:06 dsvictor94

Yes, there would definitely be a performance penalty for this case but not all pdf documents have this kind of formatting/style requirements.

If the step is opt in, the users would be aware of what it implies performance wise and can make an informed decision. I would assume that documents containing ToCs are generated in a batch fashion so the doc does not have to be available immediately.

If it makes things better, the last re-rendering applies only to Table of Contents/Images/Tables/etc. components, while the rest remain unchanged and might even have the layout cached...

@diegomura The latex tooling hides quite well the document generation process these days but the problem of getting correctly the cross references still exists. There is a brief explanation in the thread

vstirbu avatar Jun 22 '19 07:06 vstirbu

@diegomura Has there been any updates on this? Is it not possible right now to produce the page numbers in a TOC?

bharristn avatar Jul 02 '20 20:07 bharristn

Hii @diegomura , Is there any update?

avneet2112 avatar May 20 '22 10:05 avneet2112

@diegomura Is there any update on support for table of contents?

I tried the approach from this comment but couldn't get it to work with page numbers.

I'm currently trying this approach and it collects the data including page numbers correctly. However, the component is not being rendered after the state is updated and therefore the page just stays blank. Do you have pointers on how I could change the script to render out the component after the table of contents data has been collected?

import { join as joinPath } from 'path'
import { useState } from 'react'
import { Document, Page, renderToFile, Text } from '@react-pdf/renderer'

export const Pdf = ({ chapters }: { chapters: string[] }): JSX.Element => {
  const [tableOfContentsChapters, setTableOfContentsChapters] = useState<{ title: string; pageNumber: number }[]>([])
  const tmpTableOfContentsChapters: { title: string; pageNumber: number }[] = []

  function setTableOfContentsChapter(chapter: { title: string; pageNumber: number }, isLastChapter: boolean): void {
    if (!tmpTableOfContentsChapters.some(({ title }) => title === chapter.title)) {
      tmpTableOfContentsChapters.push(chapter)
    }

    if (isLastChapter) {
      setTableOfContentsChapters(tmpTableOfContentsChapters)
    }
  }

  // The table of contents data is collected correctly on the second rerender.
  // However, it is not being rendered in the final pdf.
  console.log(tableOfContentsChapters)

  return (
    <Document>
      <Page>
        {tableOfContentsChapters.map((chapter) => (
          <Text>
            {chapter.title} - {chapter.pageNumber}
          </Text>
        ))}
      </Page>
      {chapters.map((chapter, index) => (
        <Page key={index}>
          <Text
            style={{
              fontSize: 11,
            }}
            render={({ pageNumber }) => {
              setTableOfContentsChapter(
                {
                  title: chapter,
                  pageNumber: pageNumber,
                },
                index === chapters.length - 1,
              )

              return chapter
            }}
            fixed
          />
        </Page>
      ))}
    </Document>
  )
}

async function generateEbook(): Promise<void> {
  const path = joinPath(__dirname, '..', 'ebooks', `test.pdf`)
  await renderToFile(<Pdf chapters={['chapter 1', 'chapter 2', 'chapter 3']} />, path)
}

generateEbook()

fschucht avatar May 22 '22 18:05 fschucht

@fschucht did you ever solve this, I am in the exact situation as you where once I have the page numbers the component does not rerender as expected.

mohadib avatar May 26 '23 01:05 mohadib

@mohadib Yes, I managed to work around the issue by doing something like this:

const tableOfContentsChapters: { title: string; pageNumber: number }[] = []

// We render the ebook twice, first to collect the table of content chapters with page numbers,
// then to render the full ebook with a populated table of contents
await renderToString(<Ebook guide={guide} tableOfContentsChapters={tableOfContentsChapters} />)
await renderToFile(<Ebook guide={guide} tableOfContentsChapters={tableOfContentsChapters} />, filePath)

Then in each first page of a chapter, I added the current chapter to the global tableOfContentsChapters variable:

<ReactPDF.Page>
  <ReactPDF.View
    render={({ pageNumber }) => {
      if (!tableOfContentsChapters.some(({ title }) => title === chapter.title)) {
        tableOfContentsChapters.push({ title: chapter.title, pageNumber: pageNumber })
      }

      return null
    }}
  />
</ReactPDF.Page>

This way, the chapters got populated on the first render, and then were available on the second render.

fschucht avatar May 26 '23 06:05 fschucht

@fschucht we do the same thing. Do you have a solution for if the ToC ends up larger than a single page?

asgerhallas avatar May 26 '23 09:05 asgerhallas

@asgerhallas I didn't run into this case myself, so unfortunately I don't have a solution.

fschucht avatar May 26 '23 10:05 fschucht

@fschucht works great, thanks!

mohadib avatar May 26 '23 13:05 mohadib

I updated the second commentary solution to React 18. But I got an error.

The toq renders:

transpile.js:122 🚀 ~ ToCProvider ~ toq: [] length: 0 [[Prototype]]: Array(0)
transpile.js:122 🚀 ~ ToCProvider ~ toq: (2) ['TITLE1', 'TITLE2'] length: 2 [[Prototype]]: Array(0)

I'm using

"react": "18.2.0",
"next": "13.4.13",
"react-dom": "18.2.0",
"react-pdf": "^5.3.2",
"raw-loader": "^4.0.2",
"@react-pdf/renderer": "^3.1.12",

My code is:

const ToCContext = createContext({
  toq: [],
  add: () => {},
});

const ToCProvider = ({ children }) => {
  const [toq, setToq] = useState([]);
  console.log("🚀 ~ ToCProvider ~ toq:", toq);

  const add = (title) => {
    if (!toq.includes(title)) {
      setToq((prevToq) => [...prevToq, title]);
    }
  };

  return (
    <ToCContext.Provider value={{ toq, add }}>{children}</ToCContext.Provider>
  );
};

const ToC = () => {
  const { toq } = useContext(ToCContext);

  return (
    <View>
      <Heading2>TABLE OF CONTENT</Heading2>
      <UnorderedList>
        {toq.map((item, index) => (
          <ListItem key={item}>
            {index + 1}. {item}
          </ListItem>
        ))}
      </UnorderedList>
    </View>
  );
};

const InnerHeading = ({ children, ...props }) => {
  const { add, toq } = useContext(ToCContext);
  const index = toq.indexOf(children);

  React.useEffect(() => {
    if (!toq.includes(children)) {
      add(children);
    }
  }, [add, children, toq]);

  return (
    <Text {...props}>
      {index + 1}. {children}
    </Text>
  );
};

const Heading = (props) => <InnerHeading {...props} />;

const Heading1 = ({ children }) => (
  <Heading
    level={1}
  >
    {children}
  </Heading>
);
const Heading2 = ({ children }) => (
  <Heading
    level={2}
  >
    {children}
  </Heading>
);

UnorderedList and ListItem is an abstraction of Text. The component <ToCProvider> is the first component of <Document> and is warping the entire Document.

But with this solution I'm always getting an empty table of content rendering only the title. And all titles are 0. TITLE1, 0. TITLE2 etc. Not rendering correct the title and the <ToC />.

I didn't implement yet the way to correct render levels of <Heading>

image

@diegomura @fschucht @dsvictor94 @skjo0c

matbrgz avatar Aug 08 '23 17:08 matbrgz

I came up with a custom solution having a context that stores a table of contents state and the titles updating this state with their page number while rendering.

The page containing the table of contents reads the state and renders the titles and page numbers. Re-generation happens automatically due to state updates.

TableOfContentsContext.tsx

import { createContext, ReactNode, useState } from 'react';

export type TocEntry = { title: string; pageNumber: number; level: number }; // or what other properties you need to render your custom Table of Contents

type TocContextProps = {
  tableOfContents: TocEntry[];
  addToTableOfContents: (entry: TocEntry) => void;
};

export const TableOfContentsContext = createContext<TocContextProps>(
  null as unknown as TocContextProps,
);

export const TableOfContentsProvider = ({ children }: { children: ReactNode }) => {
  const [tableOfContents, setTableOfContents] = useState<TocEntry[]>([]);

  const addToTableOfContents = (entry: TocEntry) => {
    setTableOfContents((prevState) => {
      const entryExists = prevState.some(
        ({ title, pageNumber, level }) =>
          title === entry.title && pageNumber === entry.pageNumber && level === entry.level,
      );
      return entryExists ? prevState : [...prevState, entry];
    });
  };

  return (
    <TableOfContentsContext.Provider value={{ tableOfContents, addToTableOfContents }}>
      {children}
    </TableOfContentsContext.Provider>
  );
};

Usage: mark a title/text as relevant for table of contents:

PageHeader.tsx (or your title component)

const { addToTableOfContents } = useContext(TableOfContentsContext);

<Text
        id={title}
        render={({ pageNumber }) => {
            addToTableOfContents({
                title,
                pageNumber,
                level,
            });
          return title;
        }}
/>

render the table of contents on your wished Page TableOfContentsPage.tsx

const { tableOfContents } = useContext(TableOfContentsContext);

{tableOfContents
     .map(({ title, pageNumber, level }, index) => (
             <Link
               key={index}
               src={`#${title}`}
               style={} // custom style also for different levels
             >
               <Text>{title}</Text>
               <Text>{pageNumber}</Text>
             </Link>
        ))
}

Dont forget to wrap your <Document> with <TableOfContentsProvider>

maidi29 avatar Feb 23 '24 15:02 maidi29

The above is great, but doesn't work in a server-side environment, however, server-side is arguably easier because you can just store a dictionary that you pass into the second pass method.

We introduced a parameterised way of passing ?multipass=true to our endpoint, which in our case, just informs the API to do 2 render passes, the first takes in an empty dictionary to fill in, ready for the second pass, which takes in the dictionary and uses it to render the page number.

In typescript, as pseudo-code, it looks something like this:

let stream: NodeJS.ReadableStream;
if (!multipass) {
    stream = await finalRenderPass(
        new Map<string, SchemaPageFooterDetails>() // empty dictionary so page numbers are blank
    );
} else {
    const pageNumbersMap = await firstRenderPass(/** Your Parameters */); // returns page numbers from first pass
    stream = await finalRenderPass(
        /** You other parameters */
        pageNumbersMap
    );
}
return stream;

And the firstPass / finalPass methods something like this:

export type SchemaPageFooterDetails = { pageNumber: number; pageIdentifier: string };

/**
 * Renders the first pass of the PDF, this is necessary to calculate and store the page numbers in a map.
 * @returns A map of page numbers.
 */
export const firstRenderPass = async () => {
    const pageNumbersMap = new Map<string, SchemaPageFooterDetails>();
    await renderToStream(pageNumbersMap);
    return pageNumbersMap;
};

/**
 * Renders the final pass of the PDF, using the page numbers map from the first pass.
 *
 * @returns A stream of the final PDF.
 */
export const finalRenderPass = async (
    pageNumbersMap: Map<string, SchemaPageFooterDetails>
) => {
    return await renderToStream(pageNumbersMap);
};

/**
 * A helper function to render the PDF to a stream.
 *
 * @param pageNumbersMap The page numbers map to use for TOC generation.
 */
export const renderToStream = async (
    pageNumbersMap: Map<string, SchemaPageFooterDetails>
) => {
    return await ReactPDF.renderToStream(
        <PDFDocument
            pageNumbersMap={pageNumbersMap}
        />
    );
};

This has been stripped back as it had a bunch of business specific logic, but the idea hopefully helps others in a similar scenario accelerate their implementation. It assumes a way to identify and number a page, and logic that sets the dictionary internally in the component wrapping render :)

joelybahh avatar Mar 26 '24 00:03 joelybahh