feat: generate markdown static files for LLM agent token optimization
Generate both HTML and Markdown versions of each documentation page to optimize token usage for LLM crawlers and AI agents. Research shows that serving markdown instead of HTML can reduce token consumption by 60-80%, significantly improving efficiency and reducing costs for AI-powered tools accessing documentation.
Reference: https://x.com/cramforce/status/1972430376149913715
Internal Slack conversation: https://ably-real-time.slack.com/archives/C07C48W7K1A/p1759170942282069
Implementation
- Added post-build hook to convert HTML pages to clean Markdown format
- Configured nginx content negotiation to serve markdown when requested
- Added validation script to ensure markdown generation completeness
- Integrated markdown generation into CI/CD pipeline
- Added UI button with markdown icon for user access (see below, common pattern in other sites)
Note there is a corresponding PR in the website which ensure the Accept: text/markdown header is used to route to the markdown file.
Usage
Via content negotiation (for agents/crawlers):
curl -H "Accept: text/markdown" https://ably.com/docs/channels
Direct file access:
curl https://ably.com/docs/channels/index.md
Via UI: Click the Markdown icon button in the "Open In" section on any page
Technical Details
- Uses Turndown library for HTML to Markdown conversion
- Preserves code block language annotations
- Removes navigation, headers, footers and UI chrome
- Markdown files located at
/docs/{page-path}/index.md - Skips redirect pages (324 redirects detected)
- Successfully generates markdown for 209/210 content pages
- No frontmatter - clean markdown content only
[!IMPORTANT]
Review skipped
Auto reviews are disabled on this repository.
Please check the settings in the CodeRabbit UI or the
.coderabbit.yamlfile in this repository. To trigger a single review, invoke the@coderabbitai reviewcommand.You can disable this status message by setting the
reviews.review_statustofalsein the CodeRabbit configuration file.
[!NOTE]
Other AI code review bot(s) detected
CodeRabbit has detected other AI code review bot(s) in this pull request and will avoid duplicating their findings in the review comments. This may lead to a less comprehensive review.
✨ Finishing touches
🧪 Generate unit tests
- [ ] Create PR with unit tests
- [ ] Post copyable unit tests in a comment
- [ ] Commit unit tests in branch
markdown-support
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.
Comment @coderabbitai help to get the list of available commands and usage tips.
Also the newly-introduced CI check fails, so that should be looked at as well.
Yup, I saw that. I wanted to get feedback on this PR before I finalise any last issue (only one page fails to generate out fo 200+)
Thanks @jamiehenson for the feedback, thorough review.
Please see my comments, I'm keen to:
- Get your input on the language issue, however I'd like to understand if the site is discoverable now by LLMs/crawlers anyway by language. I recall some time back I recorded issues with this when trying to crawl the site for LLMs.txt. Can I get an update on the status of that and your thoughts on what I proposed.
- Do you know why https://ably-docs-markdown-supp-aqgfks.herokuapp.com/docs/getting-started/setup?lang=java gives an error? I'd prefer not to investigate that issue (seems unrelated to any changes I have made), but it is a blocker to fixing the above issue (at least in testing on the staging site().
- FWIW. Whilst I appreciate your feedback and will address it, I do think just landing this so that LLMs (not humans) can start using this is more important than getting this into a great state. I appreciate equally that you want code to improve not get worse and understand that, but I'm leaning far more towards getting shit done given the low cost of changing things with LLM, as opposed to getting shit done with code we really like for these non-critical improvements. I recognise you may not agree :)
This has been superseded by https://github.com/ably/docs/pull/3000