iree Initial tools/utils/ iree-* developer utilities.

Developer-facing utilities for IREE contributors, organized as:

Lit Test Tools (tools/utils/bin/iree-lit-*):

iree-lit-list: List test cases in files
iree-lit-extract: Extract individual test cases
iree-lit-replace: Replace test case content atomically
iree-lit-test: Run tests in isolation with debug capabilities
iree-lit-lint: Lint tests against STYLE_GUIDE.md

CI Triage Tools (tools/utils/bin/iree-ci-*):

iree-ci-triage: Analyze CI failures from GitHub Actions
iree-ci-garden: Manage failure corpus for pattern development

Use --help on any tool for detailed usage. See tools/utils/README.md for full documentation.

Setup:

pip install -e tools/utils
export PATH="$PATH:$PWD/tools/utils/bin"
# or, if you use direnv:
cp .envrc.example .envrc
direnv allow

Claude Code Integration:

Skill: iree-lit-tools for MLIR test authoring guidance
Commands: /iree-ci-triage, /iree-lit-test, /iree-lit-lint

Skills and commands should be detected automatically. The auto-approve-iree-tools.sh is a useful pre-bash hook that approves all iree-* tools. Somewhat #yolo so it's opt in, but it does allow for much less naggy triage sessions and is required to get smooth heredoc support. cp .claude/settings.json.template .claude/settings.json to enable.

Tool Reference

iree-lit-list

List test cases in a file.

iree-lit-list test.mlir              # Show all cases with metadata
iree-lit-list test.mlir --count      # Just the count
iree-lit-list test.mlir --names      # Space-separated names
iree-lit-list test.mlir --json       # Machine-readable

iree-lit-extract

Extract individual test cases.

iree-lit-extract test.mlir --case 3        # Extract case #3
iree-lit-extract test.mlir --line 123      # Extract case containing line 123
iree-lit-extract test.mlir --name "foo"    # Extract case named "foo"
iree-lit-extract test.mlir -c 1,3,5        # Multiple cases

iree-lit-replace

Replace a test case atomically (reads new content from stdin).

iree-lit-replace test.mlir --case 3 < fixed_case.mlir

# Heredoc for inline replacement (no temp files needed):
iree-lit-replace test.mlir --case 3 <<'EOF'
// RUN: iree-opt --my-pass %s | FileCheck %s
// CHECK-LABEL: @my_test
func.func @my_test() {
  return
}
EOF

iree-lit-test

Run tests in isolation with debug capabilities.

iree-lit-test test.mlir                    # Run all cases
iree-lit-test test.mlir --case 2           # Run only case #2
iree-lit-test test.mlir -c 1-3             # Run cases 1, 2, 3
iree-lit-test test.mlir --name "foo"       # Run case named "foo"
iree-lit-test test.mlir --verbose          # Show full output
iree-lit-test test.mlir --extra-flags "--debug"  # Inject flags
iree-lit-test test.mlir --dry-run          # Show commands without running

# Heredoc for rapid testing (no temp files needed):
iree-lit-test --run 'iree-opt --my-pass %s | FileCheck %s' <<'EOF'
// CHECK-LABEL: @test_fusion
// CHECK: my.fused_op
func.func @test_fusion() {
  %0 = my.op1
  %1 = my.op2 %0
  return
}
EOF

iree-lit-lint

Lint tests against the style guide.

iree-lit-lint test.mlir                    # Lint all cases
iree-lit-lint test.mlir --case 2           # Lint only case #2
iree-lit-lint test.mlir --errors-only      # Only show errors
iree-lit-lint test.mlir --help-style-guide # Show full style guide

(most of this was authored with claude and dogfooded on significant test refactoring/authoring, YMMV and it needs refinement but it's already been a massive help)

Nov 21 '25 18:11 benvanik

Yeah, I don't expect anyone to read this :P If you wanted to give it a shot that'd be useful, though! (I've tried it on two machines, but they're pretty similar)

RE 1: Good question! I'm not going to allow any context to land in the repo :) See https://github.com/iree-org/iree/blob/14c86589bb46f031da51aebc94f247f6fd7f0300/.claude/README.md (I wanted to call it HUMANS.md, but then github wouldn't render it by default). I am an extreme context miser so the bar is very high. Instead of people dropping pytorch-quality crap we're going to probably just setup an iree-org/claude-plugins or something and then can add as many random things we want there ala https://github.com/anthropics/claude-code/tree/main/plugins. The skills checked in are all-developers/nearly-all-of-the-time, and there's not very many of those. There's also not very many commands that meet that bar. Basically, local is always best for stuff, and if you want to share in more than a gist a plugins repo is the best way.

RE 2: I'd expect claude to fix them if there was any issue, the code is really well setup for that. As for brittleness: I tried to make everything generic so it's just github API calls to query jobs, their status/check reports, and fetch their logs. So workflow changes won't impact anything (though I do have our "summary" workflows hidden by default), which is nice (it'll work in forks with other workflows, etc). The error extraction from the logs is all CI-agnostic and it can run on local builds to triage issues as well, and they're all best-effort (we miss some today, but I've got quite a few of the major things like cmake/bazel/onnx/lit/mlir). The intent is that someone (me) runs iree-ci-garden to grab all runs, does a classification to see if there's any new failures we couldn't identify, and then have claude go figure them out using the documentation/guides/etc. I'd love to make it a nightly github action at some point so it could send PRs for novel failures, but things are quite slow moving in infra land and most of the errors are stable (as stable as textual errors are). Worst case if nothing works you still get a list of failed jobs and their logs downloaded so pointing an agent at them is stupid easy. Same with /iree-ci-triage PR# - worst case you at least have the logs local and can start working on them.

Dec 05 '25 02:12 benvanik

RE 1: Good question! I'm not going to allow any context to land in the repo :)

Great, that works for me. I find myself reorganizing the kind of context available for hyper specific/agents/tasks a lot and having to deal with an extra file inserting itself into my workflow would be annoying. Just not checking any in sounds great.

Instead of people dropping pytorch-quality crap we're going to probably just setup an iree-org/claude-plugins or something and then can add as many random things we want there

This was exactly what I was going to suggest if we hit the point of needing it. SGTM

RE 2

Sounds good. As long as it stays as generic as possible and non-blocking then no complaints from me.

Worst case if nothing works you still get a list of failed jobs and their logs downloaded so pointing an agent at them is stupid easy. Same with /iree-ci-triage PR# - worst case you at least have the logs local and can start working on them.

Yeah this is what I've been doing and it'll always work.

It's late for me but I'll finish a review tomorrow. I still want to read the contents of .claude on review and as a human I'm still a little apprehensive of checking in so much code from bots :P. The tools were great when I tried them before though so I'll try to review soon.

Dec 05 '25 03:12 qedawkins

as a human I'm still a little apprehensive of checking in so much code from bots :P

Me too! I went pretty hard on the testing, style enforcement (ruff + custom rules), and documentation (style guides, developer docs, claude docs, and usage) and spent a decent amount of time refining it over the last few weeks. Even though there are certainly issues I do stand by it enough to submit it under my name, since I made it better than I could have written solo and it's now our most robust piece of infrastructure :P My tolerance for AI slop is quite low so honestly my biggest fear with landing something like this has been that someone will think this is tacit approval to throw slop in the repo when it most definitely isn't.

Dec 05 '25 04:12 benvanik

I'm going to call for splitting the PR on principle

Dec 05 '25 04:12 krzysz00

Nah, not worth it.

● tools/utils/ distribution (47,051 total lines)
  ═══════════════════════════════════════════════

  Category              Lines    Files    %
  ────────────────────────────────────────────
  Code (*.py)          21,767      66    46%
  Tests (test_*.py)    19,658      46    42%
  Documentation (*.md)  3,894      12     8%
  Fixtures              1,595      24     3%
  Bin wrappers            137       7    <1%

  Roughly 46% code, 45% tests+fixtures, 8% docs.

As an admin, I'm more than happy with that. Since I don't expect anyone to be reading these PRs (unless you're volunteering to spend the next few weeks reviewing them ;) there's no value. These are no-ops to the project (besides pre-commit) as they have no dependencies, so there's no real risk.

Dec 05 '25 04:12 benvanik

I'm going to put a tentative soft block on this one until at least one non-Ben human has read the code and has some understanding of what it's all doing. I'm willing to be talked out of this over the next few days, though.

(I could take a the few days needed to be said human, if it came down to it, And I'd just like to state my position that, if this PR wasn't by someone I trust to have taken this process seriously, I would be closing it as unreviewable right now)

Dec 05 '25 05:12 krzysz00

Noted, and post-submission patches/reviews are very welcome as others start to use it.

Dec 05 '25 16:12 benvanik