[WIP]: Add support for an MCP to analyse and prioritise PRs
Before submitting
- [ ] Was this discussed/approved via a Github issue? (no need for typos and docs improvements)
- [x] Did you read the contributor guideline, Pull Request section?
- [ ] Did you make sure to update the docs?
- [ ] Did you write any new necessary tests?
What does this PR do?
This PR adds a Model Context Protocol (MCP) server for intelligent PR analysis and prioritization in the Lightning-Thunder repository. The server enables maintainers and contributors to efficiently triage and review the PR queue using both heuristic analysis and LLM-powered insights.
Users can generate both heuristic and LLM-based evaluations of the PRs, to prioritize which PR need to be reviewed asap
After installing the MCP in Cursor as:
{
"mcpServers": {
"thunder-dev-mcp": {
"type":"stdio",
"command": "PATH TO YOUR PYTHON BIN",
"args": ["/PATH TO/lightning-thunder/mcp_server/server.py"],
"env": {
}
}
}
}
it's possible to run the following examples:
- List all open PRs: Ask to Cursor: "Can you list the open PRs for lightning-thunder?"
- Analyze a specific PR: "Analyze PR #1234 from lightning-thunder"
- Get prioritized PRs: "Show me the top priority PRs for lightning-thunder with a score above 40"
- Check stale PRs: "Find all PRs that haven't been updated in 45 days"
- Generate risk report: "Generate a risk report for all high-risk PRs (score >= 6)"
- Advanced: Generate LLM priority prompt: "Generate an LLM priority prompt for PRs: 1234, 1235, 1236"
- Have a full heuristic + LLM based analyses of PRs: "Can you run an LLM batch analysis on all the open lightning-thunder PRs please?"
The last approach will return a summary like this:
### 1. LLM Priority Scores (0-100)
PR #2689: Delay Inductor until we get real input tensors
Priority Score: 85/100
This PR fixes a guard failure issue (#2680) with Inductor-compiled submodules and has 2 approvals. The change is targeted and addresses a real runtime bug affecting the Thunder-Inductor integration.
PR #2652: Move DTensor isinstance check from runtime to trace construction time
Priority Score: 75/100
Moves DTensor detection from runtime to trace construction, improving performance and correctness. Already has 1 approval and good engagement, addressing a real optimization opportunity.
PR #2675: Record last_trace_cache_stop when miss the cache
Priority Score: 70/100
Fixes a clear bug in cache profiling where stop time is only recorded on hits. Simple 2-line fix that improves observability and correctness.
PR #2697: benchmark_inference: Add CLI option to enable thunder CUDAGraph Transform
Priority Score: 60/100
Has 3 approvals and adds useful benchmarking capability, but has conflicts that need resolution. Testing/benchmarking infrastructure improvement.
PR #2695: benchmark_inference: Allow passing cache option as cli arg
Priority Score: 55/100
Similar to #2697, has 2 approvals but needs conflict resolution. Enhances benchmarking flexibility.
...
### 2. Prioritized Review Order
🔥 CRITICAL (Review Today):
PR #2689 - Delay Inductor until we get real input tensors
Why: Fixes guard failures, 2 approvals, no conflicts, addresses real bug
Action: Final review and merge
PR #2675 - Record last_trace_cache_stop when miss the cache
Why: 2-line bug fix in profiling, clear and correct
Action: Quick review and merge
PR #2652 - Move DTensor isinstance check from runtime to trace construction time
Why: Performance optimization, 1 approval, clean changes
Action: Final review and merge
...
### 🚨 HIGH (Review This Week):
PR #2553 - Fix gradient mismatch in checkpointing by preserving node metadata
Why: Correctness fix for checkpointing gradients, small surgical change
Action: Verify the fix and merge
PR #2697 & #2695 - Benchmark infrastructure improvements
Why: Both have approvals but need conflict resolution
Action: Resolve conflicts, then merge
PR #2305 - Add torch.scaled_mm
- Why: Important for quantization workflows, active discussion
Action: Address remaining review comments and merge
...
### ⚠️ MEDIUM (Review When Possible):
PR #2184 - Advanced indexing support for n-dimensional integer indexing tensors
Why: Significant feature enhancement, needs fresh eyes after staleness
Action: Full review, verify correctness with comprehensive tests
PR #2071 - add liger kernel executor
Why: Strategic performance addition, needs CI fixes
Action: Help contributor fix CI, then review
PR #2340 - Add cutlass-python-dsl executor
Why: Large strategic addition, forward-only currently
Action: Assess backward support plan, then review
...
### 3. Key Recommendations
✅ Safe to Merge Immediately (after quick final check):
PR #2689 - Inductor timing fix (2 approvals)
PR #2675 - Cache profiling bug fix (clear 2-line fix)
PR #2652 - DTensor check optimization (1 approval)
PR #2698 - Re-enable zero-size test cases
PR #2678 - Pre-commit updates
🔧 Needs Changes Before Merging:
PR #2697, #2695 - Resolve merge conflicts
PR #2071 - Fix CI failures
PR #2305 - Address stride/dtype edge cases from review comments
PR #2585 - Needs thorough review and testing for large addition
🚧 Blockers for Other Work:
PR #2689 - Blocks any Inductor integration work
PR #2652 - Blocks DTensor performance work
...
### 4. Overall Assessment
Queue Health: MODERATE (⚠️)
The PR queue shows 50 open PRs with a concerning long tail of stale contributions. However, there are several high-quality, ready-to-merge PRs at the top of the funnel.
Positive Signs:
Multiple PRs have approvals and are ready to merge (#2689, #2697, #2695, #2652)
Recent activity on critical paths (Inductor integration, DTensor, benchmarking)
Good mix of bug fixes and feature additions
Active contributors addressing real issues
Concerning Patterns:
28 PRs are stale (>30 days since update) with no recent engagement
Large PRs (+500 lines) sitting without review (#2585, #2340, #2071, #2633)
Many "WIP" PRs with unclear status or completion plan
Linting PRs languishing despite being trivial to merge
Several deprecated/obsolete PRs that should be closed
Key Features
1. Multi-dimensional Heuristic Analysis
- Risk scoring across three dimensions:
- Breaking Changes: Detects API modifications, deprecations, and large changesets
- Security: Identifies security-related keywords and sensitive file changes
- Urgency: Assesses criticality based on keywords, staleness, and community engagement
- Priority scoring (0-100) combining risk factors, review status, and merge readiness
2. PR Metadata Tracking
- Staleness metrics (days open, days since update)
- Merge conflict detection
- Review status aggregation (approvals, changes requested)
- Activity metrics (comments, recent engagement)
3. LLM-Powered Analysis Tools
-
llm_batch_analysis: Generates comprehensive prompts for LLM-based prioritization of multiple PRs - Includes detailed context: metadata, heuristic scores, activity metrics, and optional code diffs
- Human-in-the-loop design: prints prompts for use with Cursor or other LLM interfaces
4. MCP Tool Suite The server exposes 6 tools via the MCP protocol:
-
list_open_prs: Quick overview of open PRs with optional label filtering -
analyze_single_pr: Deep analysis of a single PR -
prioritize_prs: Heuristic-based prioritization of all open PRs -
generate_llm_priority_prompt: Creates master prompts for manual LLM analysis -
check_stale_prs: Identifies PRs that haven't been updated recently -
risk_report: Generates risk breakdowns by category
Use Cases
- Daily PR Triage: Quickly identify which PRs need immediate attention
- Release Planning: Assess breaking change risks before releases
- Security Review: Flag PRs that may require security scrutiny
- Stale PR Cleanup: Find PRs that need maintainer follow-up or closure
- Strategic Planning: Understand patterns in the PR queue
Technical Implementation
- Built with
fastmcpandhttpxfor GitHub API integration - Requires
GITHUB_TOKENenvironment variable for API access - Structured dataclasses for type-safe analysis results
- Configurable limits and thresholds for all analysis tools
- Handles pagination and rate limiting for large PR queues
Example Usage
# Via MCP client (e.g., from Cursor)
# 1. Quick batch analysis
llm_batch_analysis(limit=20, min_priority=30)
# 2. Check for stale PRs
check_stale_prs(days_threshold=45)
# 3. Generate risk report
risk_report(min_risk_score=5)
Fixes # (issue).
PR review
Anyone in the community is free to review the PR once the tests have passed. If we didn't discuss your PR in Github issues there's a high chance it will not be merged.
This is a developer tooling enhancement and doesn't affect the core Lightning-Thunder functionality.
Did you have fun?
Make sure you had fun coding 🙃
I love thunder!!!