[WIP]: Add support for an MCP to analyse and prioritise PRs

Open Steboss opened this issue 2 months ago • 0 comments

Before submitting

[ ] Was this discussed/approved via a Github issue? (no need for typos and docs improvements)
[x] Did you read the contributor guideline, Pull Request section?
[ ] Did you make sure to update the docs?
[ ] Did you write any new necessary tests?

What does this PR do?

This PR adds a Model Context Protocol (MCP) server for intelligent PR analysis and prioritization in the Lightning-Thunder repository. The server enables maintainers and contributors to efficiently triage and review the PR queue using both heuristic analysis and LLM-powered insights.

Users can generate both heuristic and LLM-based evaluations of the PRs, to prioritize which PR need to be reviewed asap

After installing the MCP in Cursor as:

{
  "mcpServers": {
    "thunder-dev-mcp": {
      "type":"stdio",
      "command": "PATH TO YOUR PYTHON BIN",
      "args": ["/PATH TO/lightning-thunder/mcp_server/server.py"],
      "env": {
      }
    }
  }
}

it's possible to run the following examples:

List all open PRs: Ask to Cursor: "Can you list the open PRs for lightning-thunder?"
Analyze a specific PR: "Analyze PR #1234 from lightning-thunder"
Get prioritized PRs: "Show me the top priority PRs for lightning-thunder with a score above 40"
Check stale PRs: "Find all PRs that haven't been updated in 45 days"
Generate risk report: "Generate a risk report for all high-risk PRs (score >= 6)"
Advanced: Generate LLM priority prompt: "Generate an LLM priority prompt for PRs: 1234, 1235, 1236"
Have a full heuristic + LLM based analyses of PRs: "Can you run an LLM batch analysis on all the open lightning-thunder PRs please?"

The last approach will return a summary like this:

### 1. LLM Priority Scores (0-100)
PR #2689: Delay Inductor until we get real input tensors
Priority Score: 85/100
This PR fixes a guard failure issue (#2680) with Inductor-compiled submodules and has 2 approvals. The change is targeted and addresses a real runtime bug affecting the Thunder-Inductor integration.
PR #2652: Move DTensor isinstance check from runtime to trace construction time
Priority Score: 75/100
Moves DTensor detection from runtime to trace construction, improving performance and correctness. Already has 1 approval and good engagement, addressing a real optimization opportunity.
PR #2675: Record last_trace_cache_stop when miss the cache
Priority Score: 70/100
Fixes a clear bug in cache profiling where stop time is only recorded on hits. Simple 2-line fix that improves observability and correctness.
PR #2697: benchmark_inference: Add CLI option to enable thunder CUDAGraph Transform
Priority Score: 60/100
Has 3 approvals and adds useful benchmarking capability, but has conflicts that need resolution. Testing/benchmarking infrastructure improvement.
PR #2695: benchmark_inference: Allow passing cache option as cli arg
Priority Score: 55/100
Similar to #2697, has 2 approvals but needs conflict resolution. Enhances benchmarking flexibility.
...

### 2. Prioritized Review Order
🔥 CRITICAL (Review Today):
PR #2689 - Delay Inductor until we get real input tensors
Why: Fixes guard failures, 2 approvals, no conflicts, addresses real bug
Action: Final review and merge
PR #2675 - Record last_trace_cache_stop when miss the cache
Why: 2-line bug fix in profiling, clear and correct
Action: Quick review and merge
PR #2652 - Move DTensor isinstance check from runtime to trace construction time
Why: Performance optimization, 1 approval, clean changes
Action: Final review and merge
...

### 🚨 HIGH (Review This Week):
PR #2553 - Fix gradient mismatch in checkpointing by preserving node metadata
Why: Correctness fix for checkpointing gradients, small surgical change
Action: Verify the fix and merge
PR #2697 & #2695 - Benchmark infrastructure improvements
Why: Both have approvals but need conflict resolution
Action: Resolve conflicts, then merge
PR #2305 - Add torch.scaled_mm
- Why: Important for quantization workflows, active discussion
Action: Address remaining review comments and merge
...

### ⚠️ MEDIUM (Review When Possible):
PR #2184 - Advanced indexing support for n-dimensional integer indexing tensors
Why: Significant feature enhancement, needs fresh eyes after staleness
Action: Full review, verify correctness with comprehensive tests
PR #2071 - add liger kernel executor
Why: Strategic performance addition, needs CI fixes
Action: Help contributor fix CI, then review
PR #2340 - Add cutlass-python-dsl executor
Why: Large strategic addition, forward-only currently
Action: Assess backward support plan, then review
...

### 3. Key Recommendations
✅ Safe to Merge Immediately (after quick final check):
PR #2689 - Inductor timing fix (2 approvals)
PR #2675 - Cache profiling bug fix (clear 2-line fix)
PR #2652 - DTensor check optimization (1 approval)
PR #2698 - Re-enable zero-size test cases
PR #2678 - Pre-commit updates
🔧 Needs Changes Before Merging:
PR #2697, #2695 - Resolve merge conflicts
PR #2071 - Fix CI failures
PR #2305 - Address stride/dtype edge cases from review comments
PR #2585 - Needs thorough review and testing for large addition
🚧 Blockers for Other Work:
PR #2689 - Blocks any Inductor integration work
PR #2652 - Blocks DTensor performance work
...
### 4. Overall Assessment
Queue Health: MODERATE (⚠️)
The PR queue shows 50 open PRs with a concerning long tail of stale contributions. However, there are several high-quality, ready-to-merge PRs at the top of the funnel.
Positive Signs:
Multiple PRs have approvals and are ready to merge (#2689, #2697, #2695, #2652)
Recent activity on critical paths (Inductor integration, DTensor, benchmarking)
Good mix of bug fixes and feature additions
Active contributors addressing real issues
Concerning Patterns:
28 PRs are stale (>30 days since update) with no recent engagement
Large PRs (+500 lines) sitting without review (#2585, #2340, #2071, #2633)
Many "WIP" PRs with unclear status or completion plan
Linting PRs languishing despite being trivial to merge
Several deprecated/obsolete PRs that should be closed

Key Features

1. Multi-dimensional Heuristic Analysis

Risk scoring across three dimensions:
- Breaking Changes: Detects API modifications, deprecations, and large changesets
- Security: Identifies security-related keywords and sensitive file changes
- Urgency: Assesses criticality based on keywords, staleness, and community engagement
Priority scoring (0-100) combining risk factors, review status, and merge readiness

2. PR Metadata Tracking

Staleness metrics (days open, days since update)
Merge conflict detection
Review status aggregation (approvals, changes requested)
Activity metrics (comments, recent engagement)

3. LLM-Powered Analysis Tools

llm_batch_analysis: Generates comprehensive prompts for LLM-based prioritization of multiple PRs
Includes detailed context: metadata, heuristic scores, activity metrics, and optional code diffs
Human-in-the-loop design: prints prompts for use with Cursor or other LLM interfaces

4. MCP Tool Suite The server exposes 6 tools via the MCP protocol:

list_open_prs: Quick overview of open PRs with optional label filtering
analyze_single_pr: Deep analysis of a single PR
prioritize_prs: Heuristic-based prioritization of all open PRs
generate_llm_priority_prompt: Creates master prompts for manual LLM analysis
check_stale_prs: Identifies PRs that haven't been updated recently
risk_report: Generates risk breakdowns by category

Use Cases

Daily PR Triage: Quickly identify which PRs need immediate attention
Release Planning: Assess breaking change risks before releases
Security Review: Flag PRs that may require security scrutiny
Stale PR Cleanup: Find PRs that need maintainer follow-up or closure
Strategic Planning: Understand patterns in the PR queue

Technical Implementation

Built with fastmcp and httpx for GitHub API integration
Requires GITHUB_TOKEN environment variable for API access
Structured dataclasses for type-safe analysis results
Configurable limits and thresholds for all analysis tools
Handles pagination and rate limiting for large PR queues

Example Usage

# Via MCP client (e.g., from Cursor)
# 1. Quick batch analysis
llm_batch_analysis(limit=20, min_priority=30)

# 2. Check for stale PRs
check_stale_prs(days_threshold=45)

# 3. Generate risk report
risk_report(min_risk_score=5)

Fixes # (issue).

PR review

Anyone in the community is free to review the PR once the tests have passed. If we didn't discuss your PR in Github issues there's a high chance it will not be merged.

This is a developer tooling enhancement and doesn't affect the core Lightning-Thunder functionality.

Did you have fun?

Make sure you had fun coding 🙃

I love thunder!!!

Oct 29 '25 16:10 Steboss