lightning-thunder icon indicating copy to clipboard operation
lightning-thunder copied to clipboard

[WIP]: Add support for an MCP to analyse and prioritise PRs

Open Steboss opened this issue 2 months ago • 0 comments

Before submitting
  • [ ] Was this discussed/approved via a Github issue? (no need for typos and docs improvements)
  • [x] Did you read the contributor guideline, Pull Request section?
  • [ ] Did you make sure to update the docs?
  • [ ] Did you write any new necessary tests?

What does this PR do?

This PR adds a Model Context Protocol (MCP) server for intelligent PR analysis and prioritization in the Lightning-Thunder repository. The server enables maintainers and contributors to efficiently triage and review the PR queue using both heuristic analysis and LLM-powered insights.

Users can generate both heuristic and LLM-based evaluations of the PRs, to prioritize which PR need to be reviewed asap

After installing the MCP in Cursor as:

{
  "mcpServers": {
    "thunder-dev-mcp": {
      "type":"stdio",
      "command": "PATH TO YOUR PYTHON BIN",
      "args": ["/PATH TO/lightning-thunder/mcp_server/server.py"],
      "env": {
      }
    }
  }
}

it's possible to run the following examples:

  • List all open PRs: Ask to Cursor: "Can you list the open PRs for lightning-thunder?"
  • Analyze a specific PR: "Analyze PR #1234 from lightning-thunder"
  • Get prioritized PRs: "Show me the top priority PRs for lightning-thunder with a score above 40"
  • Check stale PRs: "Find all PRs that haven't been updated in 45 days"
  • Generate risk report: "Generate a risk report for all high-risk PRs (score >= 6)"
  • Advanced: Generate LLM priority prompt: "Generate an LLM priority prompt for PRs: 1234, 1235, 1236"
  • Have a full heuristic + LLM based analyses of PRs: "Can you run an LLM batch analysis on all the open lightning-thunder PRs please?"

The last approach will return a summary like this:

### 1. LLM Priority Scores (0-100)
PR #2689: Delay Inductor until we get real input tensors
Priority Score: 85/100
This PR fixes a guard failure issue (#2680) with Inductor-compiled submodules and has 2 approvals. The change is targeted and addresses a real runtime bug affecting the Thunder-Inductor integration.
PR #2652: Move DTensor isinstance check from runtime to trace construction time
Priority Score: 75/100
Moves DTensor detection from runtime to trace construction, improving performance and correctness. Already has 1 approval and good engagement, addressing a real optimization opportunity.
PR #2675: Record last_trace_cache_stop when miss the cache
Priority Score: 70/100
Fixes a clear bug in cache profiling where stop time is only recorded on hits. Simple 2-line fix that improves observability and correctness.
PR #2697: benchmark_inference: Add CLI option to enable thunder CUDAGraph Transform
Priority Score: 60/100
Has 3 approvals and adds useful benchmarking capability, but has conflicts that need resolution. Testing/benchmarking infrastructure improvement.
PR #2695: benchmark_inference: Allow passing cache option as cli arg
Priority Score: 55/100
Similar to #2697, has 2 approvals but needs conflict resolution. Enhances benchmarking flexibility.
...

### 2. Prioritized Review Order
🔥 CRITICAL (Review Today):
PR #2689 - Delay Inductor until we get real input tensors
Why: Fixes guard failures, 2 approvals, no conflicts, addresses real bug
Action: Final review and merge
PR #2675 - Record last_trace_cache_stop when miss the cache
Why: 2-line bug fix in profiling, clear and correct
Action: Quick review and merge
PR #2652 - Move DTensor isinstance check from runtime to trace construction time
Why: Performance optimization, 1 approval, clean changes
Action: Final review and merge
...

### 🚨 HIGH (Review This Week):
PR #2553 - Fix gradient mismatch in checkpointing by preserving node metadata
Why: Correctness fix for checkpointing gradients, small surgical change
Action: Verify the fix and merge
PR #2697 & #2695 - Benchmark infrastructure improvements
Why: Both have approvals but need conflict resolution
Action: Resolve conflicts, then merge
PR #2305 - Add torch.scaled_mm
- Why: Important for quantization workflows, active discussion
Action: Address remaining review comments and merge
...

### ⚠️ MEDIUM (Review When Possible):
PR #2184 - Advanced indexing support for n-dimensional integer indexing tensors
Why: Significant feature enhancement, needs fresh eyes after staleness
Action: Full review, verify correctness with comprehensive tests
PR #2071 - add liger kernel executor
Why: Strategic performance addition, needs CI fixes
Action: Help contributor fix CI, then review
PR #2340 - Add cutlass-python-dsl executor
Why: Large strategic addition, forward-only currently
Action: Assess backward support plan, then review
...

### 3. Key Recommendations
✅ Safe to Merge Immediately (after quick final check):
PR #2689 - Inductor timing fix (2 approvals)
PR #2675 - Cache profiling bug fix (clear 2-line fix)
PR #2652 - DTensor check optimization (1 approval)
PR #2698 - Re-enable zero-size test cases
PR #2678 - Pre-commit updates
🔧 Needs Changes Before Merging:
PR #2697, #2695 - Resolve merge conflicts
PR #2071 - Fix CI failures
PR #2305 - Address stride/dtype edge cases from review comments
PR #2585 - Needs thorough review and testing for large addition
🚧 Blockers for Other Work:
PR #2689 - Blocks any Inductor integration work
PR #2652 - Blocks DTensor performance work
...
### 4. Overall Assessment
Queue Health: MODERATE (⚠️)
The PR queue shows 50 open PRs with a concerning long tail of stale contributions. However, there are several high-quality, ready-to-merge PRs at the top of the funnel.
Positive Signs:
Multiple PRs have approvals and are ready to merge (#2689, #2697, #2695, #2652)
Recent activity on critical paths (Inductor integration, DTensor, benchmarking)
Good mix of bug fixes and feature additions
Active contributors addressing real issues
Concerning Patterns:
28 PRs are stale (>30 days since update) with no recent engagement
Large PRs (+500 lines) sitting without review (#2585, #2340, #2071, #2633)
Many "WIP" PRs with unclear status or completion plan
Linting PRs languishing despite being trivial to merge
Several deprecated/obsolete PRs that should be closed

Key Features

1. Multi-dimensional Heuristic Analysis

  • Risk scoring across three dimensions:
    • Breaking Changes: Detects API modifications, deprecations, and large changesets
    • Security: Identifies security-related keywords and sensitive file changes
    • Urgency: Assesses criticality based on keywords, staleness, and community engagement
  • Priority scoring (0-100) combining risk factors, review status, and merge readiness

2. PR Metadata Tracking

  • Staleness metrics (days open, days since update)
  • Merge conflict detection
  • Review status aggregation (approvals, changes requested)
  • Activity metrics (comments, recent engagement)

3. LLM-Powered Analysis Tools

  • llm_batch_analysis: Generates comprehensive prompts for LLM-based prioritization of multiple PRs
  • Includes detailed context: metadata, heuristic scores, activity metrics, and optional code diffs
  • Human-in-the-loop design: prints prompts for use with Cursor or other LLM interfaces

4. MCP Tool Suite The server exposes 6 tools via the MCP protocol:

  • list_open_prs: Quick overview of open PRs with optional label filtering
  • analyze_single_pr: Deep analysis of a single PR
  • prioritize_prs: Heuristic-based prioritization of all open PRs
  • generate_llm_priority_prompt: Creates master prompts for manual LLM analysis
  • check_stale_prs: Identifies PRs that haven't been updated recently
  • risk_report: Generates risk breakdowns by category

Use Cases

  1. Daily PR Triage: Quickly identify which PRs need immediate attention
  2. Release Planning: Assess breaking change risks before releases
  3. Security Review: Flag PRs that may require security scrutiny
  4. Stale PR Cleanup: Find PRs that need maintainer follow-up or closure
  5. Strategic Planning: Understand patterns in the PR queue

Technical Implementation

  • Built with fastmcp and httpx for GitHub API integration
  • Requires GITHUB_TOKEN environment variable for API access
  • Structured dataclasses for type-safe analysis results
  • Configurable limits and thresholds for all analysis tools
  • Handles pagination and rate limiting for large PR queues

Example Usage

# Via MCP client (e.g., from Cursor)
# 1. Quick batch analysis
llm_batch_analysis(limit=20, min_priority=30)

# 2. Check for stale PRs
check_stale_prs(days_threshold=45)

# 3. Generate risk report
risk_report(min_risk_score=5)

Fixes # (issue).

PR review

Anyone in the community is free to review the PR once the tests have passed. If we didn't discuss your PR in Github issues there's a high chance it will not be merged.

This is a developer tooling enhancement and doesn't affect the core Lightning-Thunder functionality.

Did you have fun?

Make sure you had fun coding 🙃

I love thunder!!!

Steboss avatar Oct 29 '25 16:10 Steboss