base-ui icon indicating copy to clipboard operation
base-ui copied to clipboard

[docs-infra] Create llms.txt

Open Janpot opened this issue 8 months ago • 3 comments

  • render react docs to markdown
  • serve under /<react-docs-path>.md. e.g. for /react/components/accordion add /react/components/accordion.md
  • add a /llms.txt
  • add a link to the markdown version on each page (I guess this is temporary, we can replace with "Ask AI" dorpdown?) Screenshot 2025-05-05 at 20 23 47
  • currently the markdown is generated on the fly, right when the docs are built. We could also opt for pregenerating them and check them in our repo, this would stronger guarantee correctness in the absence of tests.

How does the mdx to md transformation currently work:

  • parse mdx files into mdx ast
  • statically analyze the jsx parts and render reference tables and demos, extract out the description and title
  • convert those demos and reference tables jsx AST into md AST.
  • error on unrecognized jsx
  • run the mdast stringify plugin to obtain markdown

Drawbacks

  • It only statically analyzes the JSX, this restricts a lot the valid jsx we are able to write
  • Introducing new syntax and jsx elements will be cumbersome

Alternatives that are more maintainable:

  • compile the mdx and render with react, use custom react reconciler to render the react tree to markdown instead of html.
  • render the docs to html and convert the html back to markdown. e.g. with https://www.npmjs.com/package/turndown

Will likely move to one of those methods as to not disrupt the velocity of the project

To do:

  • [ ] enrich llms.txt?
  • [ ] post to voyage at the end of generation

Janpot avatar Apr 17 '25 14:04 Janpot

Deploy Preview for base-ui ready!

Name Link
Latest commit aaba18f94dd3682e938c57844d58d398e88f572c
Latest deploy log https://app.netlify.com/projects/base-ui/deploys/68820dba03e05a0008a2f0c9
Deploy Preview https://deploy-preview-1738--base-ui.netlify.app
Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

netlify[bot] avatar Apr 17 '25 16:04 netlify[bot]

vite-css-base-ui-example

pnpm add https://pkg.pr.new/mui/base-ui/@base-ui-components/react@1738
pnpm add https://pkg.pr.new/mui/base-ui/@base-ui-components/utils@1738

commit: aaba18f

pkg-pr-new[bot] avatar May 05 '25 18:05 pkg-pr-new[bot]

Bundle size report

Bundle Parsed Size Gzip Size
@base-ui-components/react 0B(0.00%) 0B(0.00%)

Details of bundle changes

Generated by :no_entry_sign: dangerJS against aaba18f94dd3682e938c57844d58d398e88f572c

mui-bot avatar Jun 03 '25 12:06 mui-bot

@Janpot Is it better to have just a single llms.txt file in the sidebar that documents the entire project, rather than individual files for each page like this?

colmtuite avatar Jul 15 '25 15:07 colmtuite

@Janpot Is it better to have just a single llms.txt file in the sidebar that documents the entire project, rather than individual files for each page like this?

I regularly find myself pasting a link to a documentation page in a chat window to ask questions or to make it not hallucinate when generating code. The idea in this PR is the simplest form of AI chat integration such as "copy markdown", "link to markdown", "ask questions about this page in chatgpt/claude",... shortcuts to bring the page content into a LLM context.

I tried to keep it minimal but still discoverable. Happy to do something else, from removing it altogether to a full fledged menu as in those benchmarks, to adding a llms.txt link somewhere.

Janpot avatar Jul 15 '25 17:07 Janpot

@Janpot Yes but do you have a good sense of the utility of copying markdown/url for individual pages, versus a single url for the entire docs? If a single llms.tsx for the entire docs worked well, that would be better obviously.

Ime, after being fed an entire docs llms.txt, they will often still mess up, or seemingly forget documentation, and still need to be reminded of specific docs sections. But also, ime, I've felt like simply sending regular urls can work well from that point.

Also, I wonder about the utility of linking to the markdown url versus just having a button to copy the url? I'm not sure I can think of a use case for visiting the url?

I feel like the best approach might be to have an "llms.txt" link in the left sidebar, under "Handbook", and then a "Copy for llms" button on each page. But I'm not sure, curious what you think.

colmtuite avatar Jul 16 '25 09:07 colmtuite

Yes but do you have a good sense of the utility of copying markdown/url for individual pages, versus a single url for the entire docs? If a single llms.tsx for the entire docs worked well, that would be better obviously.

Depends but quite often, on documentation websites, I want to show the llm a specific page. e.g. "read this specific page so you can stop hallucinating random details". I do that because I'm sure for many use-cases, the llms.txt doesn't contain enough information for it to find that page on its own. I see llms.txt more as something we can feed to an MCP, but I don't have a lot of experience with that.

But also, ime, I've felt like simply sending regular urls can work well from that point.

Yes, but the point of the markdown is to reduce noise and amount of input tokens. I wouldn't be surprised if some users want to copy+paste a specific section from the markdown (to preserve some basic formatting).

Also, I wonder about the utility of linking to the markdown url versus just having a button to copy the url? I'm not sure I can think of a use case for visiting the url?

I'd 90% of the time use it as "right-click > copy link address". I'm sure some people would rather copy the markdown itself and paste it in their chat. So for them opening the markdown page would be beneficial. I'm not really the biggest fan of copy buttons, I prefer it if websites let me manage my clipboard myself. But I have no strong opinion, I'll do whatever the base team wants.

But I'm not sure, curious what you think.

Personally I'd stick to a more neutral "copy markdown" or "copy markdown link", but all is fine for me, I don't think it matters too much UX-wise whether it's a link, or a copy button and what the text is. I think we should do whatever you feel like is most "base ui docs like"

Janpot avatar Jul 16 '25 10:07 Janpot

Right yes ok. Copying sections of markdown is a good use case. Ok well I think a library-wide llms.txt link will be useful too, but can do it in a separate PR if you wish.

For this PR, the link should be a regular light gray link (same as other light gray links on the site). It should be located underneath the component description on mobile, and baseline-aligned with the component description on larger viewports.

Screenshot 2025-07-16 at 12 22 05

colmtuite avatar Jul 16 '25 11:07 colmtuite

@Janpot looks like the npm link in the header is now missing the version number. Also the route for the llms.txt link in the sidebar is missing .txt

colmtuite avatar Jul 16 '25 17:07 colmtuite

😄 well,... linking to /llms.txt works in Next.js dev mode only apparently. Not a surprise I guess, the next.js router doesn't know about these static files and does weird stuff anyway with routes that contain file extensions. npm version regression was introduced in https://github.com/mui/base-ui/pull/2167 (cc @michaldudak in case there was a specific reason to remove the version from root package.json?)

Janpot avatar Jul 16 '25 20:07 Janpot

@Janpot

For the sidebar llms.text link, I was thinking of a link to the entire markdown for the whole docs website. Here's an example from Zero docs. Screenshot attached.

Screenshot 2025-07-22 at 10 04 00

Apparently this is commonly called llms-full.txt? idk, I'm not experienced with it, just figured I'd mention it. If you know that linking to the navigation links list works well, then let's go with that.

colmtuite avatar Jul 22 '25 09:07 colmtuite

If you know that linking to the navigation links list works well, then let's go with that.

If I'm honest, nothing about llms.txt is properly standardized, and as far as I know, no major vendor is supporting any of this. I'm happy to create an additional llms-full.txt, it's not a lot of work. I think we should just do what the industry converges on (I have no idea what that is exactly though).

The problem I see with llms-full.txt is that it creates a massive amount of input tokens.

Janpot avatar Jul 22 '25 09:07 Janpot

@Janpot right ok, happy to go with whatever you think is best. ✅

colmtuite avatar Jul 22 '25 11:07 colmtuite

🤷 There's no proper spec so who really knows what kind of links are allowed? imo, since it's a document published to the web, it must have an origin. Resolving urls in that context should be very well defined, given how well this is specced, I bet even an LLM would do a great job at that task (though it should probably be just done by a tool). Anyway, the vercel markdown version of the docs are full of relative urls, and the circle ci version points to html files. None of them is doing the same thing.

But ok. Let's try to find what's the most common sensible thing. Here's what I propose we do:

  • inside .md files, all internal urls resolved to their .md variant, but we keep them root relative. This keeps them nice and portable and creates an interconnected set of markdown files. And none of the example seems to suggest it's important to have absolute urls there. (so ./popover becomes /react/components/popover.md)
  • inside llms.txt all urls resolved to their absolute url. (so /react/components/popover becomes https://base-ui.com/react/components/popover.md)
  • inside llms-full.txt
    • all urls that would resolve to another markdown file, use a hash url to their title (so /react/components/popover becomes #popover, technically there could be duplicates between titles/subtitles I guess). This can signal an llm that the info the link points to is already in the document, and nothing needs to be fetched,
    • all other internal urls, resolve to their absolute url. (so /some/other/url becomes https://base-ui.com/some/other/url but we currently have none of these)

wdyt?

Supporting both llms.txt and llms-full.txt seems fine, as they have different use cases with different AIs based on their strength or weaknesses

Which one would you link to from the sidebar?

Janpot avatar Jul 23 '25 09:07 Janpot

Which one would you link to from the sidebar?

llms.txt is likely best there

I think this looks good now aside from that link ✅

atomiks avatar Jul 24 '25 08:07 atomiks