lean-mlir refactor: downgrade MLIR generic parser from syntax categories to closed syntax

Currently almost all parts of the generic MLIR parser uses syntax categories, meaning these parts of the (generic!) grammar are in theory extensible.

In theory this can be very convenient: if a dialect wanted to customize only their operands it could choose to hook into just that part of the parser. In practice, I would expect that even the low level elements (such as mlir_op_operand) being extensible is an anti-pattern:

Such extensions would not be scoped to just that dialect, it would affect parsing of all generic MLIR syntax
The parser doesn't really seem to be written for such extensions anyway. I tried to come up with an example of extending just the operands, but the most obvious ways that came to mind didn't actually work (macro "&" n:num : mlir_op_operand => `(mlir_op_operand| % $n:num)).
Controlling which parts of the grammar are extensible would make it easier to eventually write an elaborator that outputs a meta-level AST, instead of a Lean expression of type AST, if we eventually decide to go that way.

Thus, I propose we keep mlir_op (and potentially mlir_type) as the top-level syntax category that can be hooked into, but make low-level elements of the grammar closed. This PR is still a draft, but it show-cases what this change would look like for the case of mlir_op_operand.

This matches Lean itself, where term and command are extensible, but things like def are closed syntax. Also note that this is exactly the level at which the current LLVM pretty syntax operates: that only extends mlir_op and would still work as-is after this refactor.

May 31 '24 16:05 alexkeizer

@goens @bollu @tobiasgrosser I'm curious what you think about this refactor?

May 31 '24 16:05 alexkeizer

Alive Statistics: 54 / 93 (39 failed)

May 31 '24 16:05 github-actions[bot]

I think this is a great idea.

May 31 '24 16:05 tobiasgrosser

@alexkeizer thanks for thinking about this, I'm generally happy with this change in terms of API surface area.

I don't quite understand:

Controlling which parts of the grammar are extensible would make it easier to eventually write an elaborator that outputs a meta-level AST, instead of a Lean expression of type AST, if we eventually decide to go that way.

Could you elaborate on this? Is it that each syntax category is a construction for our meta-level-AST, and having many constructors makes this unmanagable?

May 31 '24 17:05 bollu

Alive Statistics: 54 / 93 (39 failed)

May 31 '24 17:05 github-actions[bot]

@alexkeizer thanks for thinking about this, I'm generally happy with this change in terms of API surface area.

I don't quite understand:

Controlling which parts of the grammar are extensible would make it easier to eventually write an elaborator that outputs a meta-level AST, instead of a Lean expression of type AST, if we eventually decide to go that way.

Could you elaborate on this? Is it that each syntax category is a construction for our meta-level-AST, and having many constructors makes this unmanagable?

So what I'm thinking about is having a function like TSyntax `mlir_op_operand -> MetaM SSAVal . [^1] If mlir_op_operand is closed, we can just match on exactly the syntax we define, and nothing else is possible. If it's a category, though, we're promising the syntax is extensible, we would then have to somehow hook into macro expansion to make that promise actually work. Of course, we could just say that, even though it's a category, it's not actually intended to be extensible and throw some kind of error if we encounter unexpected syntax, but I'd rather codify this.

[^1]: With the caveat that SSAVal might have to become some MetaSSAVal that can possibly store a Lean expression for something, to support antiquotations

May 31 '24 17:05 alexkeizer

Alive Statistics: 54 / 93 (39 failed)

May 31 '24 18:05 github-actions[bot]

Alive Statistics: 54 / 93 (39 failed)

May 31 '24 19:05 github-actions[bot]

Alive Statistics: 54 / 93 (39 failed)

May 31 '24 19:05 github-actions[bot]

oh that's pretty cool! I didn't know about this difference (between closed syntax and extensible syntax categories).

I think it makes a lot of sense, thanks for the very thorough explanation @alexkeizer!

About the meta level ast, I also don't quite understand the point. Even if we choose to do things at elaboration time, what's the drawback of using our (current) AST data structure?

Or do you mean just that we can do (full) pattern matching on the closed syntax categories, so it's about the TSyntax mlirsoandso types? When talking about these self-made compilers from custom syntax, Mac once asked us why we wanted our own data structure and not just used TSyntax. Maybe using closed syntax is what we were missing for that, if I'm understanding you correctly

Jun 01 '24 06:06 goens

About the meta level ast, I also don't quite understand the point. Even if we choose to do things at elaboration time, what's the drawback of using our (current) AST data structure?

So the main point is to correctly communicate what is and isn't supported.

With a closed syntax we know exactly what forms of syntax to expect when we pattern-match on the TSyntax. I don't think the exhaustive check knows about this, so it definitely still makes sense to parse the TSyntax into our AST datastructure, but this parsing would have to do a pattern match on the syntax to do so, and presumably throw an error when it sees syntax that doesn't match the option we've given (which, if we wrote the match correctly, should never happen, because we control what syntax gets parsed).

An option that totally works, is to have a syntax category and just only match on the syntax we've defined. Then, when the category is extended, and this new syntax is actually used, we just throw the error that says "well, this syntax category is not actually meant to be extended". This option would be strictly easiest to implement, but it's unsatisfying IMO.

If we have syntax categories, and would like to properly support extensions, then in the function that parses the TSyntax to produce our AST we'd have to hook into the macro expansion system before pattern matching. This seems excessively complicated.

A slight road bump: apparently the machinery in the opt executable that parses things at runtime works specifically with a category parser. I'm not sure how to change that to work with a ParserDescr (as obtained from the closed syntax). There's compileParserDescr, but this runs in ImportM. I might keep mlir_region as a category for now to avoid dealing with this

Jun 03 '24 12:06 alexkeizer

@alexkeizer, let me know if you would like to revive this.

No update since a while. Let's close this to keep our PRs clean.

Oct 26 '25 13:10 tobiasgrosser