relude icon indicating copy to clipboard operation
relude copied to clipboard

Enable Context7 integration and automated documentation deployment for Relude

Open johnhaley81 opened this issue 4 months ago • 1 comments

Enable Context7 integration and automated documentation deployment for Relude

Background & Context

Context7 is a platform that provides up-to-date documentation for LLMs and AI code editors. It crawls documentation repositories and creates searchable, version-specific context. For Relude to be accessible through Context7, we need to:

  1. Provide documentation in Context7-compatible formats (markdown/mdx/txt/rst/ipynb)
  2. Ensure code examples are properly formatted and documented
  3. Set up automated documentation deployment
  4. Submit Relude to Context7's library index

Current State Analysis

What we have:

  • ✅ Comprehensive odoc comments in source code
  • ✅ Manual markdown documentation in /docs/ (Docsify)
  • ✅ Working API documentation at reazen.github.io/relude/api/
  • ✅ Rich code examples in source documentation

What's missing:

  • ❌ Automated documentation deployment pipeline
  • ❌ Markdown version of API documentation (odoc only generates HTML)
  • ❌ Unified documentation structure for Context7 crawling
  • ❌ Documentation versioning system
  • ❌ Automated link checking to prevent broken links

Technical Requirements

Context7 Requirements

  • Documentation must be in markdown/mdx/txt/rst/ipynb format
  • Code snippets should be clearly formatted with language tags
  • Examples should include context and explanations
  • Documentation should be version-specific and up-to-date

odoc Limitations

  • Currently only outputs HTML, LaTeX, and man pages (no markdown)
  • Markdown output has been requested but not yet implemented
  • Solution: Use pandoc to convert HTML to markdown

Avoiding CI Infinite Loops

Critical consideration: When committing generated documentation back to the repository, we must prevent infinite CI loops.

Strategy to Prevent Loops:

  1. Use GITHUB_TOKEN - GitHub's default token doesn't trigger new workflows
  2. Add [skip ci] to commits - Extra safety measure
  3. Use paths-ignore - Ignore changes to generated docs in CI triggers
  4. Selective triggers - Only run on releases and manual dispatch, not every push
  5. Change detection - Only commit if documentation actually changed

Proposed CI Configuration:

name: Generate Documentation
on:
  release:
    types: [published]
  workflow_dispatch: # Manual trigger
  push:
    branches: [main]
    paths-ignore:
      - 'docs/api-markdown/**'  # Ignore generated docs
      - '.github/workflows/docs.yml'

jobs:
  generate-markdown-docs:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v3
        with:
          token: ${{ secrets.GITHUB_TOKEN }}  # Default token prevents loops
      
      - name: Setup OCaml/opam
        # ... setup steps
      
      - name: Generate and convert docs
        run: |
          # Build odoc HTML
          dune build @doc
          
          # Convert HTML to Markdown using pandoc
          find _build/default/_doc/_html -name "*.html" -exec \
            pandoc -f html -t markdown --preserve-tabs --wrap=none \
            -o docs/api-markdown/{}.md {} \;
      
      - name: Commit if changed
        run: |
          git config user.name "github-actions[bot]"
          git config user.email "github-actions[bot]@users.noreply.github.com"
          
          # Check if there are changes
          if [[ -n $(git status -s docs/api-markdown/) ]]; then
            git add docs/api-markdown/
            git commit -m "Update API documentation [skip ci]"
            git push
          else
            echo "No documentation changes to commit"
          fi

Implementation Checklist

Phase 1: Research & Planning

  • [ ] Test pandoc conversion locally with odoc HTML output
  • [ ] Verify pandoc preserves OCaml/ReasonML code blocks correctly
  • [ ] Design directory structure for generated markdown
  • [ ] Create proof-of-concept conversion script
  • [ ] Test CI loop prevention strategies locally

Phase 2: Documentation Generation Script

  • [ ] Create scripts/generate-markdown-docs.sh that:
    • [ ] Builds odoc documentation (dune build @doc)
    • [ ] Finds all generated HTML files
    • [ ] Converts each HTML to markdown using pandoc with appropriate flags:
      pandoc -f html -t markdown --preserve-tabs --wrap=none \
             --extract-media=docs/api-markdown/media
      
    • [ ] Organizes markdown files in logical structure
    • [ ] Adds front matter for Context7 metadata
    • [ ] Generates module index file
  • [ ] Handle special cases:
    • [ ] Module type signatures
    • [ ] Code examples with proper language tags
    • [ ] Cross-references between modules
    • [ ] Preserve syntax highlighting hints

Phase 3: GitHub Actions Workflow

  • [ ] Create .github/workflows/generate-docs.yml with:
    • [ ] Triggers: releases, manual dispatch, selective main pushes
    • [ ] OCaml/opam environment setup
    • [ ] Pandoc installation
    • [ ] Documentation generation script execution
    • [ ] Change detection before committing
    • [ ] Proper git configuration using GITHUB_TOKEN
    • [ ] Commit with [skip ci] message
  • [ ] Add paths-ignore to main CI workflow to ignore docs/api-markdown/**
  • [ ] Test workflow in a branch first to verify no loops

Phase 4: Documentation Structure

  • [ ] Organize generated documentation:
    docs/
    ├── api-markdown/           # Generated from odoc
    │   ├── v3.0.0/            # Version-specific docs
    │   │   ├── modules/       # Module documentation
    │   │   ├── index.md       # API index
    │   │   └── examples.md    # Extracted examples
    │   └── latest/            # Symlink to latest version
    ├── guides/                # Existing manual docs
    └── index.md              # Main entry point
    
  • [ ] Create navigation structure for Context7
  • [ ] Add module dependency information
  • [ ] Include type signature documentation

Phase 5: Quality Assurance

  • [ ] Verify pandoc conversion quality:
    • [ ] Code blocks properly formatted
    • [ ] Links converted correctly
    • [ ] Module signatures readable
    • [ ] Examples preserved
  • [ ] Test CI workflow:
    • [ ] No infinite loops
    • [ ] Proper versioning on releases
    • [ ] Manual trigger works
    • [ ] Change detection functions correctly
  • [ ] Validate Context7 compatibility:
    • [ ] Markdown properly formatted
    • [ ] Code examples tagged with language
    • [ ] Documentation structure crawlable

Phase 6: GitHub Pages Deployment

  • [ ] Update existing GitHub Pages workflow to include markdown docs
  • [ ] Ensure both HTML and markdown docs are deployed
  • [ ] Set up proper directory structure for serving
  • [ ] Configure redirects if needed
  • [ ] Test accessibility of markdown documentation

Phase 7: Context7 Integration

  • [ ] Prepare submission to Context7:
    • [ ] Documentation URL: https://github.com/reazen/relude/tree/main/docs/api-markdown
    • [ ] Library metadata (name, description, version)
    • [ ] Code example format verification
  • [ ] Submit via Context7 web interface or GitHub PR
  • [ ] Monitor crawling and indexing
  • [ ] Test Context7 search results
  • [ ] Verify version-specific documentation access

Phase 8: Documentation Maintenance

  • [ ] Update PUBLISHING.md with:
    • [ ] Documentation generation process
    • [ ] How to trigger manual documentation builds
    • [ ] Troubleshooting CI loops
  • [ ] Create documentation guidelines:
    • [ ] odoc comment format standards
    • [ ] Code example requirements
    • [ ] Module documentation structure
  • [ ] Set up monitoring:
    • [ ] Check for broken links
    • [ ] Verify Context7 indexing
    • [ ] Monitor documentation build times

Technical Implementation Details

Pandoc Conversion Command

# Convert single file
pandoc -f html -t markdown \
  --preserve-tabs \
  --wrap=none \
  --extract-media=docs/api-markdown/media \
  -o output.md input.html

# Batch conversion with proper directory structure
find _build/default/_doc/_html -name "*.html" | while read file; do
  rel_path=${file#_build/default/_doc/_html/}
  output_file="docs/api-markdown/${rel_path%.html}.md"
  mkdir -p $(dirname "$output_file")
  pandoc -f html -t markdown --preserve-tabs --wrap=none "$file" -o "$output_file"
done

Git Configuration for Safe Commits

# Use GitHub Actions bot identity
git config user.name "github-actions[bot]"
git config user.email "github-actions[bot]@users.noreply.github.com"

# Always use [skip ci] in commit messages
git commit -m "chore: update API documentation [skip ci]"

# Use GITHUB_TOKEN (default) not a PAT
# This automatically prevents workflow retriggering

Success Criteria

  • [ ] Documentation automatically generates on releases without CI loops
  • [ ] All API documentation available in markdown format
  • [ ] Pandoc conversion preserves code examples and formatting
  • [ ] Generated documentation commits don't trigger new workflows
  • [ ] Relude indexed and searchable in Context7
  • [ ] Version-specific documentation properly organized
  • [ ] Zero broken documentation links
  • [ ] Documentation passes Context7's quality checks

Resources & References

johnhaley81 avatar Aug 12 '25 17:08 johnhaley81

Great news! I discovered that odoc 3.1.0 (released on July 15, 2025) now includes an experimental markdown generator that could significantly simplify our documentation workflow.

Key Finding

PR ocaml/odoc#1341 added native markdown output support to odoc. This means we might be able to skip the pandoc HTML-to-markdown conversion step entirely and generate markdown documentation directly from our ReasonML source code.

What This Changes

Instead of the current proposed workflow:

odoc → HTML → pandoc → Markdown

We could potentially use:

odoc → Markdown (native)

Benefits for Our Use Case

  1. Simpler CI pipeline - No need for pandoc installation and HTML parsing
  2. Better fidelity - Direct markdown generation preserves documentation intent better than HTML conversion
  3. Designed for our ecosystem - The PR was specifically built with Melange documentation in mind
  4. Cleaner output - Native generation avoids conversion artifacts

Next Steps to Investigate

  • Test the experimental markdown generator with our codebase
  • Verify compatibility with Context7's markdown requirements
  • Check if it handles our ReasonML syntax properly (the PR mentions future support for both ReasonML and OCaml syntax rendering)

Caveat

The feature is marked as "experimental", so we should test thoroughly. Known limitations include incomplete support for references in markdown headings and some complex documentation features, but these might not affect our use case.

This could make Phase 3 (HTML to Markdown conversion) unnecessary and simplify our entire documentation pipeline! 🚀

johnhaley81 avatar Aug 12 '25 22:08 johnhaley81