[docs-infra] Create llms.txt
- render react docs to markdown
- serve under
/<react-docs-path>.md. e.g. for /react/components/accordion add /react/components/accordion.md - add a /llms.txt
- add a link to the markdown version on each page (I guess this is temporary, we can replace with "Ask AI" dorpdown?)
- currently the markdown is generated on the fly, right when the docs are built. We could also opt for pregenerating them and check them in our repo, this would stronger guarantee correctness in the absence of tests.
How does the mdx to md transformation currently work:
- parse mdx files into mdx ast
- statically analyze the jsx parts and render reference tables and demos, extract out the description and title
- convert those demos and reference tables jsx AST into md AST.
- error on unrecognized jsx
- run the mdast stringify plugin to obtain markdown
Drawbacks
- It only statically analyzes the JSX, this restricts a lot the valid jsx we are able to write
- Introducing new syntax and jsx elements will be cumbersome
Alternatives that are more maintainable:
- compile the mdx and render with react, use custom react reconciler to render the react tree to markdown instead of html.
- render the docs to html and convert the html back to markdown. e.g. with https://www.npmjs.com/package/turndown
Will likely move to one of those methods as to not disrupt the velocity of the project
To do:
- [ ] enrich llms.txt?
- [ ] post to voyage at the end of generation
Deploy Preview for base-ui ready!
| Name | Link |
|---|---|
| Latest commit | aaba18f94dd3682e938c57844d58d398e88f572c |
| Latest deploy log | https://app.netlify.com/projects/base-ui/deploys/68820dba03e05a0008a2f0c9 |
| Deploy Preview | https://deploy-preview-1738--base-ui.netlify.app |
| Preview on mobile | Toggle QR Code...Use your smartphone camera to open QR code link. |
To edit notification comments on pull requests, go to your Netlify project configuration.
pnpm add https://pkg.pr.new/mui/base-ui/@base-ui-components/react@1738
pnpm add https://pkg.pr.new/mui/base-ui/@base-ui-components/utils@1738
commit: aaba18f
Bundle size report
| Bundle | Parsed Size | Gzip Size |
|---|---|---|
| @base-ui-components/react | 0B(0.00%) | 0B(0.00%) |
Generated by :no_entry_sign: dangerJS against aaba18f94dd3682e938c57844d58d398e88f572c
@Janpot Is it better to have just a single llms.txt file in the sidebar that documents the entire project, rather than individual files for each page like this?
@Janpot Is it better to have just a single llms.txt file in the sidebar that documents the entire project, rather than individual files for each page like this?
I regularly find myself pasting a link to a documentation page in a chat window to ask questions or to make it not hallucinate when generating code. The idea in this PR is the simplest form of AI chat integration such as "copy markdown", "link to markdown", "ask questions about this page in chatgpt/claude",... shortcuts to bring the page content into a LLM context.
I tried to keep it minimal but still discoverable. Happy to do something else, from removing it altogether to a full fledged menu as in those benchmarks, to adding a llms.txt link somewhere.
@Janpot Yes but do you have a good sense of the utility of copying markdown/url for individual pages, versus a single url for the entire docs? If a single llms.tsx for the entire docs worked well, that would be better obviously.
Ime, after being fed an entire docs llms.txt, they will often still mess up, or seemingly forget documentation, and still need to be reminded of specific docs sections. But also, ime, I've felt like simply sending regular urls can work well from that point.
Also, I wonder about the utility of linking to the markdown url versus just having a button to copy the url? I'm not sure I can think of a use case for visiting the url?
I feel like the best approach might be to have an "llms.txt" link in the left sidebar, under "Handbook", and then a "Copy for llms" button on each page. But I'm not sure, curious what you think.
Yes but do you have a good sense of the utility of copying markdown/url for individual pages, versus a single url for the entire docs? If a single llms.tsx for the entire docs worked well, that would be better obviously.
Depends but quite often, on documentation websites, I want to show the llm a specific page. e.g. "read this specific page so you can stop hallucinating random details". I do that because I'm sure for many use-cases, the llms.txt doesn't contain enough information for it to find that page on its own. I see llms.txt more as something we can feed to an MCP, but I don't have a lot of experience with that.
But also, ime, I've felt like simply sending regular urls can work well from that point.
Yes, but the point of the markdown is to reduce noise and amount of input tokens. I wouldn't be surprised if some users want to copy+paste a specific section from the markdown (to preserve some basic formatting).
Also, I wonder about the utility of linking to the markdown url versus just having a button to copy the url? I'm not sure I can think of a use case for visiting the url?
I'd 90% of the time use it as "right-click > copy link address". I'm sure some people would rather copy the markdown itself and paste it in their chat. So for them opening the markdown page would be beneficial. I'm not really the biggest fan of copy buttons, I prefer it if websites let me manage my clipboard myself. But I have no strong opinion, I'll do whatever the base team wants.
But I'm not sure, curious what you think.
Personally I'd stick to a more neutral "copy markdown" or "copy markdown link", but all is fine for me, I don't think it matters too much UX-wise whether it's a link, or a copy button and what the text is. I think we should do whatever you feel like is most "base ui docs like"
Right yes ok. Copying sections of markdown is a good use case. Ok well I think a library-wide llms.txt link will be useful too, but can do it in a separate PR if you wish.
For this PR, the link should be a regular light gray link (same as other light gray links on the site). It should be located underneath the component description on mobile, and baseline-aligned with the component description on larger viewports.
@Janpot looks like the npm link in the header is now missing the version number. Also the route for the llms.txt link in the sidebar is missing .txt
😄 well,... linking to /llms.txt works in Next.js dev mode only apparently. Not a surprise I guess, the next.js router doesn't know about these static files and does weird stuff anyway with routes that contain file extensions. npm version regression was introduced in https://github.com/mui/base-ui/pull/2167 (cc @michaldudak in case there was a specific reason to remove the version from root package.json?)
@Janpot
For the sidebar llms.text link, I was thinking of a link to the entire markdown for the whole docs website. Here's an example from Zero docs. Screenshot attached.
Apparently this is commonly called llms-full.txt? idk, I'm not experienced with it, just figured I'd mention it. If you know that linking to the navigation links list works well, then let's go with that.
If you know that linking to the navigation links list works well, then let's go with that.
If I'm honest, nothing about llms.txt is properly standardized, and as far as I know, no major vendor is supporting any of this. I'm happy to create an additional llms-full.txt, it's not a lot of work. I think we should just do what the industry converges on (I have no idea what that is exactly though).
The problem I see with llms-full.txt is that it creates a massive amount of input tokens.
@Janpot right ok, happy to go with whatever you think is best. ✅
🤷 There's no proper spec so who really knows what kind of links are allowed? imo, since it's a document published to the web, it must have an origin. Resolving urls in that context should be very well defined, given how well this is specced, I bet even an LLM would do a great job at that task (though it should probably be just done by a tool). Anyway, the vercel markdown version of the docs are full of relative urls, and the circle ci version points to html files. None of them is doing the same thing.
But ok. Let's try to find what's the most common sensible thing. Here's what I propose we do:
- inside .md files, all internal urls resolved to their .md variant, but we keep them root relative. This keeps them nice and portable and creates an interconnected set of markdown files. And none of the example seems to suggest it's important to have absolute urls there. (so
./popoverbecomes/react/components/popover.md) - inside llms.txt all urls resolved to their absolute url. (so
/react/components/popoverbecomeshttps://base-ui.com/react/components/popover.md) - inside llms-full.txt
- all urls that would resolve to another markdown file, use a hash url to their title (so
/react/components/popoverbecomes#popover, technically there could be duplicates between titles/subtitles I guess). This can signal an llm that the info the link points to is already in the document, and nothing needs to be fetched, - all other internal urls, resolve to their absolute url. (so
/some/other/urlbecomeshttps://base-ui.com/some/other/urlbut we currently have none of these)
- all urls that would resolve to another markdown file, use a hash url to their title (so
wdyt?
Supporting both llms.txt and llms-full.txt seems fine, as they have different use cases with different AIs based on their strength or weaknesses
Which one would you link to from the sidebar?
Which one would you link to from the sidebar?
llms.txt is likely best there
I think this looks good now aside from that link ✅