docusaurus New Docusaurus plugin: `docusaurus-plugin-llms-txt`

Have you read the Contributing Guidelines on issues?

[X] I have read the Contributing Guidelines on issues.

Description

Hello!

I'd like to propose the creation of a new docusaurus plugin for the creation of a llms.txt file.

As background, an llms.txt file is a way of co-locating needed information such that an LLM processing web pages is able to do so effectively and quickly. More information can be found on the llms.txt proposal website

The goal of this feature is to create a plugin that will generate this file for your docs.

Has this been requested on Canny?

No

Motivation

Having an llms.txt is a business need for our company, but I'm certain it would be helpful for others. I have already created a proof-of-concept plugin that writes the file, but I wanted to bring this to the community as a whole to see how the design could be improved and/or match others' use cases.

Additionally, many docs sites that I use and admire, such as dub.co and Cloudflare have already implemented llms.txt files. The former via Mintlify, the latter via a script.

Adding a new, optional plugin would give Docusaurus users the ability to opt-in to this new standard and allow their documentation to be accessed easily by LLM agents.

API design

The design is very similar to the sitemap plugin.

Configuration

ignorePatterns: ability to ignore specific paths from being included in the llms.txt.
filename: ability to change the name and output path of the llms.txt file.

I also considered configuration modifying the output of the file, but the llms.txt proposal is fairly set on the file structure so I declined to add that as a configuration option.

Have you tried building it?

Yes! I made a simple plugin that works for our use case. I would like to move it into this repo in order to make it available to other folks and to get feedback on the design. I feel it could be made much simpler and cleaner 😄

Self-service

[X] I'd be willing to contribute this feature to Docusaurus myself.

Feb 04 '25 14:02 jharrell

We usually add to the Docusaurus core repo what we use on our own website. In this case it's not really a need we have, and adding this to core would put the burden on us to maintain over the long term. Many people want to add their plugin to core but we can't do that otherwise we'd end up with too many packages to maintain. I'd prefer to keep a lean core and keep this plugin in the community.

Also, I'm really not sure what this plugin adds to the regular sitemap.xml file.

I've checked your POC (https://github.com/prisma/docs/pull/6645), and as far as I understand it basically creates a list of links in Markdown format, but using a .txt extension instead of Markdown (why not use llmt.md? 🤷‍♂ )

Result: https://jharrell-llms-plugin.docs-51g.pages.dev/llms.txt

If we look at the Cloudflare example, it's also a list of links in Markdown format. https://developers.cloudflare.com/llms.txt

I'm not sure how the h2 headings and occasional labels bring a lot of value to a regular sitemap.

Also your site recommends to link to html.md files, which you don't do here so the LLM would have to read an HTML page anyway. And we use MDX which means that this format is already less "LLM-friendly" that regular Markdown files which cannot contain React components that must be evaluated.

This site is linking to Markdown files, for example: https://docs.fastht.ml/llms.txt

This field is not my expertise but I don't find this proposal super convincing, it looks more like an early draft not covering edge cases than a spec. I don't think it should be added to Docusaurus core until the proposal gets more mainstream adoption and gets better documented. I'd also prefer if there was first a community implementation that satisfies early adopters. We can consider adding this to our repo later, but not as a first step. Until then I'm happy to support you developing a community plugin, so let me know if you are unable to achieve your needs.

I also wonder why it has to be a Docusaurus plugin. You could build a generic CLI tool that handles a full generic static deployment, looking for sitemap.xml or HTML files and creating LLM-friendly content.

Feb 06 '25 10:02 slorber

Thank you @slorber for the context. I don't disagree with your assessment, so I'll be closing this proposal.

For further information about why this over a sitemap etc there's this section of the proposed standard: https://llmstxt.org/#existing-standards

Thank you for the feedback, though. I'll continue to use my simplified solution.

Feb 06 '25 14:02 jharrell

@jharrell could you share your "simplified solution", this does sound useful :)

Feb 19 '25 14:02 DenhamPreen

@DenhamPreen sure. We're still iterating but here's the latest: https://github.com/prisma/docs/blob/22208d52e4168028dbbe8b020b10682e6b526e50/docusaurus.config.ts#L95

It generates an llms-full.txt and llms.txt.

Happy to chat further about it if I can help, my bsky account is in my GitHub profile 😊

Feb 19 '25 14:02 jharrell

Considering the number of issues linking back to this one, I think we should consider implementing such a plugin in Docusaurus core, even though I'm still not super convinced by the usefulness of this file.

See some ideas on how this can be implemented here: https://github.com/facebook/docusaurus/discussions/11191#discussioncomment-13244061

Note: many have copied the Prisma implementation, which works, but IMHO it's not ideal. We only generate llms.txt in prod builds, so it's preferable to read the dirs and md source files there instead of inside loadContent() (which is more useful in dev to support hot reload).

May 23 '25 08:05 slorber

Plugin design?

In my opinion, the problems we have when generating such llms file are:

which routes/source files should be listed
how to group routes together under Markdown headings of various levels
how do we order the headings
how do we order the links within each group

We already exposed some metadata in the postBuild({routes, routesBuildMetadata}) attributes.

export default function pluginLlms(
  context: LoadContext,
  options: PluginOptions,
): Plugin<void> | null {
  return {
    name: 'docusaurus-plugin-llms',
    async postBuild({routes, routesBuildMetadata}) {
      const finalRoutes = flattenRoutes(routes);
      finalRoutes.forEach((route) => {
        if (route.metadata?.sourceFilePath) {
          console.log(
            `Route ${route.path} was created from markdown file ${route.metadata?.sourceFilePath}`,
          );
        }
      });
    },
  };
}

We could easily enrich these with extra metadata that could be useful to generate the llms files (title, breadcrumb, other metadata).

Now, I'm not sure what kind of API the plugin could expose to define how Markdown files are grouped, and how we define an explicit order (does it even matter to LLMs?).

I guess we could provide a callback so that you can define the Markdown breadcrumb for each file, but that wouldn't allow you to order things:

['llms-plugin',{
  getMarkdownBreadcrumb: ({title, sourceFilePath}) => {
    if (isBlog(sourceFilePath)) {
      return ["Blog"];
    }
    if (isiOSDocs(sourceFilePath)) {
      return ["Docs","iOS"];
    }
    if (isAndroidDocs(sourceFilePath)) {
      return ["Docs", "Android"];
    }
    return null;
  }
}]

Does it make sense?

It could be useful if the community provided concrete examples. Given a specific sample site, what kind of LLMS file do you expect it to output, and why?

The more diverse examples we have, the easier it becomes to design an API that suits all needs.

May 23 '25 08:05 slorber

I also created a plugin: https://github.com/rachfop/docusaurus-plugin-llms

May 23 '25 18:05 rachfop

I also started working on our own version of the plugin: https://github.com/signalwire/docs/pull/290

I tried to mimic the behavior 1:1 with stripe where it also generates a .md version of every doc in the same route. So most routes you can append just .md and get the raw version of the doc.

I also opted to do a postBuild action and converted the rendered HTML back to markdown. My decision to do this was because I didn't want to deal with components and partials in the initial content.

I also added some built in rehype plugins to handle certain edge-cases that are known (like list in tables), and updating links to match the URL options that are set in the config (relative path, full URL, markdown generated doc)

I tried to make it as flexible as possible to work with any current Docusaurus website via the contentSelectors property.

Example preview can be seen here: https://deploy-preview-290--signalwire-docs.netlify.app/llms.txt

Because its not pushed to our main site yet (also haven't implemented validate URL logic yet) don't use the full URLs, but just the relative portion (e.g: dont use https://example/com/ai.md use /ai.md

May 23 '25 22:05 Devon-White

I think an official LLMs.txt plugin would be great. Here's how I believe it should work:

Proposed Build Process

1. llms.txt Generation

Scan all content directories (/docs, /blog, etc.) for .md files
Generate a hierarchical tree structure based on file paths and frontmatter metadata (see Vercel llms.txt)
Use frontmatter description field or a new llms_description metadata attribute for page descriptions
The llms.txt files should be located at the root of the content. So /docs/llms.txt, /blog/llms.txt

2. Raw Markdown File Generation

The llms.txt standard recommend providing an .md for every page which the main llms.txt can link to.

All .md pages need a cleaned markdown file with frontmatter and jsx stripped at {page-url}.md
Example: /docs/integrations/react → /docs/integrations/react.md
Maintain internal links but convert them to reference other .md files (if that is even possible)

3. llms-full.txt Generation

With all .md files from 2. combined create a llms-full.txt file also at the root of the content. So /docs/llms.txt, /blog/llms.txt

I would be happy to give it a shot.

Jun 05 '25 21:06 spaansba

I think an official LLMs.txt plugin would be great. Here's how I believe it should work:

Proposed Build Process

1. llms.txt Generation

Scan all content directories (/docs, /blog, etc.) for .md files

Generate a hierarchical tree structure based on file paths and frontmatter metadata (see Vercel llms.txt)

Use frontmatter description field or a new llms_description metadata attribute for page descriptions

The llms.txt files should be located at the root of the content. So /docs/llms.txt, /blog/llms.txt

2. Raw Markdown File Generation

The llms.txt standard recommend providing an .md for every page which the main llms.txt can link to.

All .md pages need a cleaned markdown file with frontmatter and jsx stripped at {page-url}.md

Example: /docs/integrations/react → /docs/integrations/react.md

Maintain internal links but convert them to reference other .md files (if that is even possible)

3. llms-full.txt Generation

With all .md files from 2. combined create a llms-full.txt file also at the root of the content. So /docs/llms.txt, /blog/llms.txt

I would be happy to give it a shot.

You may want to take a look at the plugin we just created, as it does a lot of what your looking for (link conversion, md file generation, ability to pass your own remark & rehype plugins to alter the generation, etc..)

The big difference for our approach is we work with the routes that are passed to us during the postBuild cycle. We then use unified to help find the html file and convert it back to markdown. My decision to convert the rendered HTML back to markdown is to handle MDX partials and React components. Content may not be displayed properly since they are not rendered yet.

I think the only things we don't currently have you are looking for is

A llms-full.txt generation option
The ability to overwrite the description for a page(route).

With how the code is made, both of these options would be easy to add.

Feel free to open any feature request or issues at: https://github.com/signalwire/docusaurus-plugins

Jun 05 '25 21:06 Devon-White

@Devon-White your plugin works great! Thanks for sharing 🙏 Implemented here https://github.com/cedarjs/cedar/pull/118/files Exposed at this url https://cedarjs.com/llms.txt

Jun 09 '25 13:06 Tobbe

Joining the party as well and sharing this in case it’s useful to anyone: I also followed the path of creating a custom plugin to generate /llms.txt and /llms-full.txt, tailored to my needs for Juno.

Note: I'm not an AI expert at all, more a real noob.

I may have taken a different approach by actually retro-engineering the Markdown files from the generated HTML files - i.e., once the site is generated, I filter the files to build a tree of the information I consider important for the language model, and then generate a Markdown file for each of those HTML files using Turndown. In addition, I also use JSDOM to extract the title and description for each of those links.

Worth noting: when I generate the Markdown files, I also manipulate the links to construct a navigation structure related to those files - i.e., Markdown files link to other Markdown files, not to the original HTML.

The solution definitely needs more iteration, and it’s not my most brilliant work - I left performance considerations aside and haven’t written any tests yet 😅 - but it’s a start.

I scoped all the logic and functions into a single module.

You can find the plugin here 👉 https://github.com/junobuild/docs/blob/main/plugins/docusaurus.llms.plugin.ts

Jul 02 '25 06:07 peterpeterparker

Interesting blog post from a Sentry engineer implementing llms.txt: https://byk.im/posts/marking-it-up-and-down/

Since we cannot go directly from MDX to Markdown, we had to render the HTML from MDX first and then convert it to Markdown, essentially doubling the work.

Jul 02 '25 14:07 slorber

So, in addition to the custom plugin I shared above, I’ve added a CI job that snapshots the generated llms.txt files and commits any changes with the PR. It slows down a bit, but this way I can review the diff and make sure the LLMs still look good. No formal tests but, some sort of assertions. Long story short: if that could be useful to anyone in addition to the plugin too, here you go 👉 https://github.com/junobuild/docs/blob/main/.github/workflows/llms.yml

Jul 07 '25 11:07 peterpeterparker

where did we end up with this?

Jul 22 '25 18:07 nicolasiscoding

Additionally, for the plugin i created above, i have created a theme to add a Copy Page button with a drop down to ask about the contents directly to a LLM provider in their cloud chat. This is mimicking the behavior that we are beginning to see from big documentation providers like Mintlify.

Example:

The copyButton will render on any page that has a MD file generated for it.

To accompany this, i also pushed a major upgrade for the plugin, which includes a overhaul of the config interface. For the most part its just a reorganization, but I also introduced Sections as a feature. Sections allow you to specify exactly what docs are housed under a section. You will also have the ability to add descriptions, title names, and subsections in a section object.

I also added the ability t add attachedFiles which will parse the contents of the file and create a markdown file with its contents inside of it. Helpful for items that are useful for llms, but not handy as a doc (example would be a OpenAPI spec)

Right now these additions are on the alpha tagged release, just in case feedback is provided and changes are needed. Once its in a comfortable state for a time period, i will do the official release.

The alpha packages however can be found here:

Plugin: https://www.npmjs.com/package/@signalwire/docusaurus-plugin-llms-txt/v/2.0.0-alpha.2

Theme: https://www.npmjs.com/package/@signalwire/docusaurus-theme-llms-txt/v/1.0.0-alpha.3

Oct 04 '25 11:10 Devon-White

@Devon-White Thanks for the plugin! It works like a charm: https://github.com/mlflow/mlflow/pull/18676

Nov 05 '25 06:11 harupy