docs feat: generate markdown static files for LLM agent token optimization

Generate both HTML and Markdown versions of each documentation page to optimize token usage for LLM crawlers and AI agents. Research shows that serving markdown instead of HTML can reduce token consumption by 60-80%, significantly improving efficiency and reducing costs for AI-powered tools accessing documentation.

Reference: https://x.com/cramforce/status/1972430376149913715

Internal Slack conversation: https://ably-real-time.slack.com/archives/C07C48W7K1A/p1759170942282069

Implementation

Added post-build hook to convert HTML pages to clean Markdown format
Configured nginx content negotiation to serve markdown when requested
Added validation script to ensure markdown generation completeness
Integrated markdown generation into CI/CD pipeline
Added UI button with markdown icon for user access (see below, common pattern in other sites)

Note there is a corresponding PR in the website which ensure the Accept: text/markdown header is used to route to the markdown file.

Usage

Via content negotiation (for agents/crawlers):

curl -H "Accept: text/markdown" https://ably.com/docs/channels

Direct file access:

curl https://ably.com/docs/channels/index.md

Via UI: Click the Markdown icon button in the "Open In" section on any page

Technical Details

Uses Turndown library for HTML to Markdown conversion
Preserves code block language annotations
Removes navigation, headers, footers and UI chrome
Markdown files located at /docs/{page-path}/index.md
Skips redirect pages (324 redirects detected)
Successfully generates markdown for 209/210 content pages
No frontmatter - clean markdown content only

Sep 30 '25 21:09 mattheworiordan

[!IMPORTANT]

Review skipped

Auto reviews are disabled on this repository.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

[!NOTE]

Other AI code review bot(s) detected

CodeRabbit has detected other AI code review bot(s) in this pull request and will avoid duplicating their findings in the review comments. This may lead to a less comprehensive review.

✨ Finishing touches

🧪 Generate unit tests

[ ] Create PR with unit tests
[ ] Post copyable unit tests in a comment
[ ] Commit unit tests in branch markdown-support

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

Sep 30 '25 21:09 coderabbitai[bot]

Also the newly-introduced CI check fails, so that should be looked at as well.

Yup, I saw that. I wanted to get feedback on this PR before I finalise any last issue (only one page fails to generate out fo 200+)

Oct 01 '25 21:10 mattheworiordan

Thanks @jamiehenson for the feedback, thorough review.

Please see my comments, I'm keen to:

Get your input on the language issue, however I'd like to understand if the site is discoverable now by LLMs/crawlers anyway by language. I recall some time back I recorded issues with this when trying to crawl the site for LLMs.txt. Can I get an update on the status of that and your thoughts on what I proposed.
Do you know why https://ably-docs-markdown-supp-aqgfks.herokuapp.com/docs/getting-started/setup?lang=java gives an error? I'd prefer not to investigate that issue (seems unrelated to any changes I have made), but it is a blocker to fixing the above issue (at least in testing on the staging site().
FWIW. Whilst I appreciate your feedback and will address it, I do think just landing this so that LLMs (not humans) can start using this is more important than getting this into a great state. I appreciate equally that you want code to improve not get worse and understand that, but I'm leaning far more towards getting shit done given the low cost of changing things with LLM, as opposed to getting shit done with code we really like for these non-critical improvements. I recognise you may not agree :)

Oct 01 '25 21:10 mattheworiordan

This has been superseded by https://github.com/ably/docs/pull/3000

Dec 11 '25 09:12 m-hulbert