refactor: downgrade MLIR generic parser from syntax categories to closed syntax
Currently almost all parts of the generic MLIR parser uses syntax categories, meaning these parts of the (generic!) grammar are in theory extensible.
In theory this can be very convenient: if a dialect wanted to customize only their operands it could choose to hook into just that part of the parser. In practice, I would expect that even the low level elements (such as mlir_op_operand) being extensible is an anti-pattern:
- Such extensions would not be scoped to just that dialect, it would affect parsing of all generic MLIR syntax
- The parser doesn't really seem to be written for such extensions anyway. I tried to come up with an example of extending just the operands, but the most obvious ways that came to mind didn't actually work (
macro "&" n:num : mlir_op_operand => `(mlir_op_operand| % $n:num)). - Controlling which parts of the grammar are extensible would make it easier to eventually write an elaborator that outputs a meta-level AST, instead of a Lean expression of type
AST, if we eventually decide to go that way.
Thus, I propose we keep mlir_op (and potentially mlir_type) as the top-level syntax category that can be hooked into, but make low-level elements of the grammar closed. This PR is still a draft, but it show-cases what this change would look like for the case of mlir_op_operand.
This matches Lean itself, where term and command are extensible, but things like def are closed syntax.
Also note that this is exactly the level at which the current LLVM pretty syntax operates: that only extends mlir_op and would still work as-is after this refactor.
@goens @bollu @tobiasgrosser I'm curious what you think about this refactor?
Alive Statistics: 54 / 93 (39 failed)
I think this is a great idea.
@alexkeizer thanks for thinking about this, I'm generally happy with this change in terms of API surface area.
I don't quite understand:
Controlling which parts of the grammar are extensible would make it easier to eventually write an elaborator that outputs a meta-level AST, instead of a Lean expression of type AST, if we eventually decide to go that way.
Could you elaborate on this? Is it that each syntax category is a construction for our meta-level-AST, and having many constructors makes this unmanagable?
Alive Statistics: 54 / 93 (39 failed)
@alexkeizer thanks for thinking about this, I'm generally happy with this change in terms of API surface area.
I don't quite understand:
Controlling which parts of the grammar are extensible would make it easier to eventually write an elaborator that outputs a meta-level AST, instead of a Lean expression of type AST, if we eventually decide to go that way.
Could you elaborate on this? Is it that each syntax category is a construction for our meta-level-AST, and having many constructors makes this unmanagable?
So what I'm thinking about is having a function like TSyntax `mlir_op_operand -> MetaM SSAVal . [^1] If mlir_op_operand is closed, we can just match on exactly the syntax we define, and nothing else is possible. If it's a category, though, we're promising the syntax is extensible, we would then have to somehow hook into macro expansion to make that promise actually work.
Of course, we could just say that, even though it's a category, it's not actually intended to be extensible and throw some kind of error if we encounter unexpected syntax, but I'd rather codify this.
[^1]: With the caveat that SSAVal might have to become some MetaSSAVal that can possibly store a Lean expression for something, to support antiquotations
Alive Statistics: 54 / 93 (39 failed)
Alive Statistics: 54 / 93 (39 failed)
Alive Statistics: 54 / 93 (39 failed)
oh that's pretty cool! I didn't know about this difference (between closed syntax and extensible syntax categories).
I think it makes a lot of sense, thanks for the very thorough explanation @alexkeizer!
About the meta level ast, I also don't quite understand the point. Even if we choose to do things at elaboration time, what's the drawback of using our (current) AST data structure?
Or do you mean just that we can do (full) pattern matching on the closed syntax categories, so it's about the TSyntax mlirsoandso types? When talking about these self-made compilers from custom syntax, Mac once asked us why we wanted our own data structure and not just used TSyntax. Maybe using closed syntax is what we were missing for that, if I'm understanding you correctly
About the meta level ast, I also don't quite understand the point. Even if we choose to do things at elaboration time, what's the drawback of using our (current) AST data structure?
So the main point is to correctly communicate what is and isn't supported.
With a closed syntax we know exactly what forms of syntax to expect when we pattern-match on the TSyntax. I don't think the exhaustive check knows about this, so it definitely still makes sense to parse the TSyntax into our AST datastructure, but this parsing would have to do a pattern match on the syntax to do so, and presumably throw an error when it sees syntax that doesn't match the option we've given (which, if we wrote the match correctly, should never happen, because we control what syntax gets parsed).
An option that totally works, is to have a syntax category and just only match on the syntax we've defined. Then, when the category is extended, and this new syntax is actually used, we just throw the error that says "well, this syntax category is not actually meant to be extended". This option would be strictly easiest to implement, but it's unsatisfying IMO.
If we have syntax categories, and would like to properly support extensions, then in the function that parses the TSyntax to produce our AST we'd have to hook into the macro expansion system before pattern matching. This seems excessively complicated.
A slight road bump: apparently the machinery in the opt executable that parses things at runtime works specifically with a category parser. I'm not sure how to change that to work with a ParserDescr (as obtained from the closed syntax). There's compileParserDescr, but this runs in ImportM. I might keep mlir_region as a category for now to avoid dealing with this
@alexkeizer, let me know if you would like to revive this.
No update since a while. Let's close this to keep our PRs clean.