docusaurus TOC does not work when importing one MDX into another

🐛 Bug Report

On our site we have a couple of duplicate doc pages. To avoid repeating content we have one doc as the source of truth then import the content like so:

import Content from './doc1.md';

<Content />

The issue is, the table of contents is not picked up on the pages that import the content. Please let me know if there is a better way to handle duplicate pages and if this usage is just conceptually wrong

Have you read the Contributing Guidelines on issues?

Yes

To Reproduce

npx @docusaurus/init@latest init my-website classic

in doc2:

import Content from './doc1.md';

<Content />

It's worth noting that this also happens on my 6 month old version, so I don't think that it's due to anything recent

Dec 14 '20 16:12 oriooctopus

This usage should be supported, but it's a current limitation of the existing system.

Unfortunately, I'm not sure how to implement this technically, and still busy on i18n.

In the meantime I'd recommend to only put raw text in mdx and duplicate the headings. That's not ideal but I don't have any other solution for now 😓

You could also add a build script that copy a file to multiple locations, or use a remark plugin that substitute a placeholder with a content string in multiple docs

Dec 14 '20 18:12 slorber

How difficult would it be to fix? I could possibly put up a PR

Dec 15 '20 01:12 oriooctopus

@oriooctopus unfortunately I suspect this to be quite complex to implement, and can't even give you much help to do so because I'm not sure how this should be done.

You can look at the code in this folder: packages/docusaurus-mdx-loader/src/remark/toc If you are comfortable with ASTs and visitors you can give it a try

Dec 15 '20 08:12 slorber

@slorber I think we can address this by getting docusaurus-mdx-loader to follow jsx tags into their imported sources when traversing the AST.

I did something similar to this before, using the fs api to read the file and getting remark to parse this in-place of the jsx node (so you can continue to recurse into the imported file and visit its nodes).

The only downside in my eyes is that we'd be using the node fs api, which means this package wouldn't work at runtime in the browser anymore (if it does now).

Do you think this would be an acceptable change?

Dec 16 '20 10:12 jknoxville

Yes @jknoxville this is what I have in mind: a recursive visitor that is able to "flatten" the TOC structure of multiple MDX docs (and preserve the correct TOC ordering).

This code already runs in node (it's a Webpack loader)

Dec 16 '20 15:12 slorber

@jknoxville do you still have any interest in fixing this? If not, maybe we could sync on what you had in mind and I might be able to implement it?

Jan 19 '21 19:01 oriooctopus

@oriooctopus I'm interested, but it's not top of my queue right now, so feel free to take a stab!

Here's a prototype I made a while ago, for a remark plugin that effectively re-implements mdx transclusions by parsing the target of any imported markdown files, and then replacing any JSX nodes that use it, with the nodes of the target AST. https://gist.github.com/jknoxville/8ed4cb8bcef348362a99dd70bb8006ae

It's scrappy, and not production-ready (currently uses eval for one thing - I'd recommend using babel to parse the imports instead), but it works. Maybe there's a more elegant way to do it.

What I imagine, is modifying search.js to do what that prototype does before it starts visiting headings to assemble its TOC.

I haven't tried this, but don't see why it wouldn't work.

Jan 20 '21 12:01 jknoxville

if that helps I remembered that Gatsby has a way to extract all imports/exports from MDX.

https://github.com/gatsbyjs/gatsby/blob/2e42197025e2e1bac06c721c3cc44135bf8ef526/packages/gatsby-plugin-mdx/utils/gen-mdx.js#L199

Jan 20 '21 12:01 slorber

Can the table of content be overwritten using a React component?

I'm experiencing a similar issue where content generated via a React component is not parsed by the table of contents (for obvious reasons), which got me thinking about if there was a way to provide a custom table of content for certain pages which could help workaround this issue.

Apr 30 '21 06:04 Laptopmini

@Laptopmini, I think extracting headings from React components is an edge case, more complicated and expensive (as it would require rendering the comp to see what's the output), so it's not likely the first thing we could build.

We could allow you to export a custom toc in the MDX document, or modify the default one we computed for you?

---
id: myDoc
--- 

export function toc(originalToc) {
  return [...originalToc,{value: 'My React heading',id: 'my react anchor id',children: []}]
}

# Title

## Subtitle

Lorem Ipsum

Does it look like a good workaround for your usecase?

Apr 30 '21 08:04 slorber

Unfortunately, the component I use in the page also make use of network requests, thus even if it is rendered, it won't be able to provide all its headings at mount time; because of this it would be beyond expensive to have the framework figure out the correct headings for the TOC.

The best solution would be to allow to export TOC's items as a stateful value, thus allowing in any context to update these value using its companion method, causing the TOC to re-render the correct values on changes. Or alternatively, export a custom toc functional component, which the framework can then render in place of the TOC. This would allow wrapping the TOC in a parent component controlling it and its props (the items rendered), or completely replace it with a custom component (which may still use the items as a prop).

Apr 30 '21 19:04 Laptopmini

I found a solution to my scenario, where I use a React portal to inject a custom table of contents, reusing the same structure and classNames as the current theme. The className usage is hacky and subject to breaking but it works well and should serve fine for the time being.

const TableOfContents = ({ items }) => {
  if (!items.length) return null;
  return useMemo(() => {
    const [parent] = document.getElementsByClassName(
      'tableOfContents_node_modules-@docusaurus-theme-classic-lib-next-theme-TOC-'
    );
    return ReactDOM.createPortal(
      <ul className="table-of-contents table-of-contents__left-border">
        {items.map(({ id }) => {
          return (
            <li key={id}>
              <a
                href={`#${id}`}
                className="table-of-contents__link table-of-contents__link--active"
              >
                {id}
              </a>
            </li>
          );
        })}
      </ul>,
      parent
    );
  }, [items]);
};

<TableOfContents items={[{ id: "example1" }, { id: "example2" }]} />

Then simply make sure the items you which to link to from the table of contents includes the following hidden anchor:

<a
  aria-hidden="true"
  tabIndex="-1"
  className="anchor enhancedAnchor_node_modules-@docusaurus-theme-classic-lib-next-theme-Heading-"
  id={id}
/>;

May 04 '21 22:05 Laptopmini

Hi @Laptopmini -- I would love to try to replicate your workaround to this issue, but I am not quite as savvy with respect to coding the backend. Could you provide some more specifics about where, exactly, you added the react component above, or provide a link to your repo where you have this working so I can look at it? Thanks!

May 05 '21 17:05 sweeneyskirt-sl

@sweeneyskirt-sl It seems like docusaurus allocates a column in the page for a possible Table Of Contents, regardless of if its created or not by the library. You can verify this by checking for the presence of an element with the class tableOfContents_node_modules-@docusaurus-theme-classic-lib-next-theme-TOC- in the right hand side of the page. It is within the rightmost col in the DOM, the class name is subject to change depending on the theme you are using it seems.

If you are using the same theme, or adapting the definition of TableOfContents I wrote to target the right class, you can then use <TableOfContents items={[{ id: "example1" }, { id: "example2" }]} /> in a .mdx file and it should inject a custom table of contents into the page based on the items you pass it.

Any place you wish to link to in the page then needs to have the <a> definition mentioned previously, a hidden link which the table of content scrolls to when a link is clicked and causes the window's location to change to include an ID, such as #example1.

May 06 '21 23:05 Laptopmini

@Laptopmini what I understand is that you want to create a TOC dynamically from data fetched from an API.

This is unrelated to the current issue which is about importing one MDX doc inside another.

Please open another issue if you want to discuss dynamically generated TOC, and let's keep this issue focused on its initial problem. I believe you could just swizzle the TOC component and add your stateful logic here.

May 07 '21 10:05 slorber

Hi @slorber -- Thanks for your reply. So I have a file that defines a number of configuration variables that are common to several environments, so I have a single doc that I am importing into the other docs, but I need each of the variables to be included in the floating TOC, which they are not, as clearly confirmed by this issue. I see from your earlier response that fixing this issue is not a priority for docusauraus right now -- do you know if it will be at any point in the near roadmap? Are there any viable workarounds (the gatsby plugin you mention, for example)? Manually adding a header for each variable is not viable for my content. It would defeat the purpose of the transclusion. Thank you for any advice!

May 07 '21 14:05 sweeneyskirt-sl

Any solution for this? @slorber @oriooctopus

Jun 08 '21 14:06 khushal87

I have not heard of any updates to this issue yet, @khushal87. I did throw in a feature request at https://docusaurus.canny.io/feature-requests/p/include-transclusion-content-headings-in-toc so please feel free to vote it up so we can get this on the roadmap at least! :)

Jun 08 '21 14:06 sweeneyskirt-sl

Done, hope to get this soon. @sweeneyskirt-sl :crossed_fingers:

Jun 08 '21 14:06 khushal87

Some related notes:

We added better support for the _ prefix convention so that doc partials are not creating additional routes: https://github.com/facebook/docusaurus/pull/5173
Instead of using MDX imports (which don't work with the TOC), we may try to explore a way to include one file's content into another. MDX will process the md content as a whole (1 final React component instead of multiples) and be able to compute the appropriate TOC. This solution has been mentioned here already: https://github.com/facebook/docusaurus/issues/599#issuecomment-605847247 Please let me know if that could work for your usecase?

Jul 15 '21 13:07 slorber

A temporary workaround could be to export your own TOC manually, eventually re-using the TOC exported by the MDX partials you used.

I've used this trick on our changelog page, that imports the changelog from the root of the repo: https://github.com/facebook/docusaurus/pull/5331

This can work well particularly if your existing doc is just an "index" for many imported docs and does not declare its own headings:

import Chapter1, {toc as Chapter1TOC} from "_chapter1.md"
import Chapter2, {toc as Chapter2TOC} from "_chapter2.md"

<Chapter1 />

<Chapter2 />

export const toc = [...Chapter1TOC, ...Chapter2TOC];

This becomes less usable once you add headings between the 2 imported chapters

Aug 10 '21 17:08 slorber

One open question is how would you treat the insert's heading levels? Markdown does not natively allow for relative heading levels like some other specialized authoring tools.

For example, if you have an H1 in your insert, would you always insert is as H1 or would you transform it to an H2 if inserted below an H1? In the first case, we could break the text flow. In the second, we risk not guessing the target level right.

Jan 25 '22 14:01 boevski

We don't make any assumptions. You insert an H1 heading, and it is an H1 heading. We call these imported files "partials" because, well, they are just "partials".

Jan 25 '22 14:01 Josh-Cena

This can work well particularly if your existing doc is just an "index" for many imported docs and does not declare its own headings:
import Chapter1, {toc as Chapter1TOC} from "_chapter1.md"
import Chapter2, {toc as Chapter2TOC} from "_chapter2.md"

<Chapter1 />

<Chapter2 />

export const toc = [...Chapter1TOC, ...Chapter2TOC];

Is this working with a partial imported in a page that already has a ToC? I've tried this but the ToC from the partial isn't added to the doc page's ToC. Is there a way to do that?

Feb 14 '22 23:02 ltribolet

@ltribolet If the page you are importing the partial to already has a TOC, then no, you would probably need something discussed in #6201 to do it.

Feb 15 '22 01:02 Josh-Cena

A temporary workaround could be to export your own TOC manually, eventually re-using the TOC exported by the MDX partials you used.

I've used this trick on our changelog page, that imports the changelog from the root of the repo: #5331

This can work well particularly if your existing doc is just an "index" for many imported docs and does not declare its own headings:
import Chapter1, {toc as Chapter1TOC} from "_chapter1.md"
import Chapter2, {toc as Chapter2TOC} from "_chapter2.md"

<Chapter1 />

<Chapter2 />

export const toc = [...Chapter1TOC, ...Chapter2TOC];
This becomes less usable once you add headings between the 2 imported chapters

It is working.

Mar 19 '22 07:03 Yukiniro

After #6729, we are in a better position of fixing this problem. If anyone wants to take a look at this issue, here's how we should solve this:

The TOC remark plugin should recognize import statements that import a Markdown document, and add a named import called toc if it's not already imported. e.g.

  // Non-MD files are ignored
  import Partial1 from "./Partial1.jsx";
  // Imports already containing a `toc` name are kept as-is
  import Partial2, { toc as Partial2TOC } from "./Partial2.mdx";
  import Partial3, { toc } from "./Partial3.md";
  // Automatically append a `toc` import. The name must be unique to avoid conflicts.
  // I think we can simply use "__toc" + name of the partial component
- import Partial4 from "./Partial4.mdx";
+ import Partial4, { toc as __tocPartial4 } from "./Partial4.mdx";
- import Partial5, { frontMatter } from "./Partial5.md";
+ import Partial5, { frontMatter, toc as __tocPartial5 } from "./Partial5.md";

Simultaneously, this traversal step needs to record the names of all Markdown components in scope.

The TOC collection part (the visitor), in addition to visiting heading nodes, should also keep track of where these Markdown partials are rendered. When it encounters <Partial1 /> where Partial1 is a Markdown component (known from the previous step), it inserts a dummy placeholder in the toc list telling the codegen step that "here's we should insert the TOC from the partial".

In order to know the name of the component being used, we need to parse the JSX expression with, say, Babel, since a jsx node has its content perfectly preserved. This should be fine, since JSX in Markdown is usually lightweight so performance is not sacrificed much. We can inspect using SWC later. The TOC algorithm already uses Babel to parse export statements and find existing exports called toc.

Note that this may not be as trivial as it seems. Consider the following case:

<div>

Some content

</div>

Instead of seeing one JSX, the visitor will see three nodes, with the opening tag and the closing tag being different nodes. If we naively pass <div> to Babel, the parser will fail. We will need to add a closing tag to let it parse gracefully.

Alternatively, we can be bold and assume that all Markdown partials are their own nodes (i.e. on their own paragraphs, or at least not wrapped by other JSX nodes like <div><Partial1 /></div>). In this case, we may get away with simple regex parsing.

Now that we have collected the TOC list, which looks something like:

const toc = [
  {
    value: 'Thanos',
    id: 'thanos',
    level: 2,
  },
  {
    value: 'Tony Stark',
    id: 'tony-stark',
    level: 2,
  },
  {
    name: 'Partial2TOC',
  },
  {
    value: 'Avengers',
    id: 'avengers',
    level: 3,
  },
  {
    name: '__tocPartial4',
  },
];

The last step is relatively easy. During codegen, we transform all exotic TOC nodes to a spread:

export const toc = [
  {
    value: 'Thanos',
    id: 'thanos',
    level: 2,
  },
  {
    value: 'Tony Stark',
    id: 'tony-stark',
    level: 2,
  },
  ...Partial2TOC,
  {
    value: 'Avengers',
    id: 'avengers',
    level: 3,
  },
  ...__tocPartial4,
];

And voila! There we get a fully functional TOC. Note the merits of this approach: we never actually traverse the imported Markdown partial. The exotic TOCs remain as opaque data structures.

The main technical difficulties lie in parsing the import statements, adding extra import specifiers, and parsing the JSX nodes. Community contributions welcome.

Mar 23 '22 08:03 Josh-Cena

Any updates on this? @Josh-Cena

Jun 20 '22 15:06 shahriarpshuvo

As the labels indicate, this is welcoming community contributions. As the milestone indicates, this will not be pursued by the team before the 2.0 release. As a general etiquette, please do not create bumper comments because they only pollute the notifications of everyone who subscribed to this thread. All development of Docusaurus is done through GitHub, so any progress is evident by observing the conversation.

Jun 20 '22 15:06 Josh-Cena

import Content from './doc1.md';

Correct me if I'm wrong, the document gets imported as MDXComponent (with headings already rendered as h2s so won't have TOCs generated for?)

I suspect the eventual upgrade to mdxjs2 will solve this because of how markdown will be treated differently within JSX blocks.

Jun 28 '22 02:06 o-v

No. The reason why there's no TOC is because TOC generation is a Remark plugin that statically analyzes the Markdown document instead of actually running it and collecting h2 headings present in the generated JSX. So my proposed solution above involves collecting the incoming toc variable of ./doc1.md, and inserting it into the current file's TOC.

Jun 28 '22 02:06 Josh-Cena

@Josh-Cena I understand. TOC is curial part of a Documentation website. And MDX is indented to make multiple files. I wonder someone raised this issue long ago, yet there is no progress so far. I'm not experienced enough to contribute right now. But I believe this deserves more priority.

Jun 30 '22 09:06 shahriarpshuvo

docusaurus docusaurus copied to clipboard

TOC does not work when importing one MDX into another

🐛 Bug Report

Have you read the Contributing Guidelines on issues?

To Reproduce

docusaurus
docusaurus copied to clipboard