circt [HW] Rework InnerSym infra to support nested symbol tables

trafficstars

This PR aims to rework the existing InnerSym infrastructure to properly support nested symbol tables. This is a feature that we'll be needing over on the Ibis dialect side, and which probably will be useful in the future if/when other dialects also need >1 levels of symbol table nesting, with InnerSym-like lookup semantics.

The change is backwards-compatible with the existing InnerSym infrastructure; hence, no changes in any .mlir FIRRTL or HW tests, and only minimal changes in some FIRRTL and HW-related files.

Changes:

InnerSymRef can now specify a path of symbol references (<@A::@B::@C)
The InnerSymbolTable trait is now an interface that inherits from InnerSymbol; i.e. InnerSymbolTables must now themselves define a symbol (to be able to be referenced in a path).
InnerSymbolTables must now be nested directly within either an InnerRefNamespaceLike or InnerSymbolTableOpInterface parent operation.
InnerSymbolTable operations are now required to implement either SymbolOpInterface or InnerSymbolOpInterface; previously, only SymbolOpInterface was allowed/required (disallowing nesting).
Somewhat in extension of the above, InnerRefNamespace now no longer requires the operation that it's attached to to be a SymbolTable - since relying on this would imply that InnerSymbolTable-defining operations always are mlir::SymbolOp... which they are not, due to the above point. AFAICT, the SymbolTable constraint was mainly used in InnerRefNamespace to scan top-level operations. Now, we "just" do a manual scan for InnerSymbolTable-defining operations within the InnerRefNamespace. To be seen whether the performance impact of this is significant.
Partly motivated by 5., InnerRefNamespace is now a stand-alone analysis. This implies that it no longer takes SymbolTable& and InnerSymbolTableCollection& references (a design choice which i don't fully understand, since it allows users to create completely unrelated symbol table to InnerSymbolTable collection relations). The refactor for doing this was fairly small, and I didn't find any usecases that indicated that the prior separation was absolutely necessary.
... which implies that InnerSymbolTableCollection is now private to the InnerRefNamespace implementation, and all prior uses of InnerSymbolTableCollection now uses an InnerRefNamespace - analysis (execution) times should (hopefully) be identical, it's just a matter of how the symbol table collection is wrapped.
static InnerSymbolTable::walkSymbols now takes an additinal (optional) callback that is called for each InnerSymbolTable encountered; currently, this is used to build an InnerSymbolTable.
- Doesn't use the built-in Operation::walk function; instead, implements it's own walk s.t. it can track the currently active NestedSymbolTable.

Because I like graphs, i decided to maintain the nested structure as a graph of InnerSymbolTables. This implies that each InnerSymbolTable maintains the symbols defined directly within their scope, as well as any nested symbol tables. Lookups will therefore be performed recursively into the nested symbol tables. An alternative designpoint would have been to only maintain a single, top-level InnerSymbolTable which maintains its symbol table as a DenseMap<hw::InnerRefAttr, InnerSymTarget>; a design which implies that all symbols are recorded in a single place with their fully qualified name (the InnerRefAttr). Given the single namespace, it would most likely also imply a slightly faster lookup time for (deeply) nested symbols. However, since we don't currently have any significant performance requirements for deeply nested symbol tables (all existing firrtl/hw usecases uses a single level of nesting, so lookup times should be identical to prior to this PR), I'd personally err on the side of readability and extendability, and go with the graph-like structure - fight me in the comment section :).

Finally, the graph-like structure is also implemented using double linking; that is, child InnerSymbolTables have a reference to their parent symbol table. This is somewhat leftover of my initial design which supported relative references. I personally think it's a nice thing to have, but since it isn't strictly necessary it could be removed if someone deems it unacceptable.

Apr 19 '24 12:04 mortbopet

Alrightey, dove in and

changed hw::InnerRef leaf/getName() to getTarget
changed hw::InnerRef::getModule to getRoot
Removed doubly-linking of innersymboltables
Some prose updated in the rationale doc

@teqdruid @darthscsi @dtzWill feel free to re-review.

Apr 23 '24 09:04 mortbopet

@darthscsi I'm hoping i made it clear in the rationale changes that InnerRefNamespace no longer is required to be a SymbolTable... It can be, it's just not going to be used by any of the lookup code (InnerRefNamespace now implements it's own scan of its inner region for InnerSymbolTable-defining ops).

As i see it, InnerRefNamespace cannot anymore rely on the "top-level" InnerSymbolTables being discoverable through a SymbolTable since this would imply that the top-level operations always implement an MLIR symbol. This assumption works for HW and FIRRTL, but not for Ibis, where, for instance, we are using ibis.container as the nest-able thing, that is, our InnerSymbolTable operation. If ibis.container were to implement an MLIR symbol, this would imply that an ibis.container itself would have to be a SymbolTable which would defeat the whole purpose of this.

Instead, the provided change to InnerRefNamespace/InnerSymbolTable requires that an InnerSymbolTable-defining operation is either an MLIR symbol or an InnerSymbol.

InnerRefNamespace will therefore scan the top-level "manually", i.e. not using a SymbolTable analysis.

Apr 24 '24 15:04 mortbopet

@dtzSiFive, @youngar, @teqdruid I'd really appreciate some eyes on this if you have time. I'm leaving for some time off come next week, and we have some internal work requiring this change, so I'm hoping to get this change in before then! 🙏

May 01 '24 08:05 mortbopet

Hey just want to let you know that I and other are looking at this PR and have been thinking about it at a high level, especially how should regular Symbols and InnerSymbols interact. One thing that would be helpful is if the PR description was updated to cover how this works, or if the rationale captured these details. I think a change that you will be making is that an InnerRef should encode whether each ref in the array is a SymbolRef or an InnerSymbolRef.

I think one of the use cases needed is that you want an ibis.container op that is both an InnerSymbol and InnerSymbolTable, and can live directly under a IRN op, and they are nest-able, is that right?

circt.container [IRN] {
  ibis.container @Outer [IS, IST] {
    ibis.container @Inner [IS, IST] { }
  }
}

Can a SymbolTable op live under an InnerSymbolTable, and how do reference through those tables work? I think its important that we make sure we don't break something like SymbolDCE by resolving references in a different way.

  ; This example has a ref that wants to resolve a SymbolRef through an InnerSymbolTable.
  ; The SR will be resolved according to the parent SymbolTable (`circt.container`), not `@Bar` like intended. 
circt.container IRN, IST {
  hw.sym_table_op @Bar ST, IS {
    hw.module @Baz S { }
  }
  hw.ref InnerRef[InnerSymbolRef<"Bar">, SymbolRef<"Baz">]
}

It seems to me that it only makes sense for SymbolRefs to appear at the root of an InnerRefs, otherwise we are inevitably changing the resolution rules for SymbolRefs.

InnerSymbols support per-field symbols on aggregate objects, and I was wondering how that works with InnerSymbolTables. Does it make sense to have per-field InnerSymbolTables, or maybe SymbolTables can only be used when the op's InnerSymbol is not associated with a result and does not support per-field symbols (see getTargetResultIndex and supportsPerFieldSymbols). When these aggregate ops are split into multiple ops (by something like FlattenIO/FIRRTLLowerTypes), where does that symbol table go?

May 02 '24 20:05 youngar

What happens when an op is both a Symbol and InnerSymbol, are they exclusive?

Can an op be both a SymbolTable and InnerSymbolTable?

May 02 '24 20:05 youngar

Depending on how complex this is, it might make sense to create some test ops just for writing tests for this code.

May 02 '24 21:05 youngar

Symbol visibility usually determines if SymbolTables expose internal symbols, e.g. a private symbol table does not expose public symbols underneath it. Does this work for InnerSymbols and InnerSymbolTables? (And Symbols+InnerSymbolTables) https://mlir.llvm.org/docs/SymbolsAndSymbolTables/#symbol-visibility

I think that InnerSymbols are currently missing the split between Nested and Private visibility.

May 02 '24 21:05 youngar

@youngar, @dtzSiFive thank you for the reviews. Based on your feedback, and the fact that I'll be OOF for a few weeks come Wednesday, I think the scope of the changes/thinking required is larger than what I'll be able to do before then. Based on how critical this becomes internally, @teqdruid might chime in here in one of the following weeks. If not, I'll get back to working on this come June.

May 03 '24 14:05 mortbopet

for instance, we are using ibis.container as the nest-able thing, that is, our InnerSymbolTable operation. If ibis.container were to implement an MLIR symbol, this would imply that an ibis.container itself would have to be a SymbolTable which would defeat the whole purpose of this.

For the nesting case, no. ibis.container can have an optional symbol and an optional inner symbol, mutually exclusive. Top level containers would define a symbol, nested ones would define an inner symbol.

One thing I'm wondering the more I think about this is why you need nested inner symbol tables. If we assume the general practice in circt that (inner) symbol name doesn't imply user-visible name, and given that inner-symbols can be defined arbitrarily deep in region nesting, then the only thing nesting really gives you is aliasing in inner symbol values. I can already have: ST{ op[S=foo] { op[IS=bar] { op[IS=baz] } } and refer to baz uniquely and canonically with hierpaths [foo, baz]. I do have to ensure baz isn't defined in other nestings, but since the name itself shouldn't have meaning, this shouldn't be a big deal.

May 06 '24 14:05 darthscsi

One thing I'm wondering the more I think about this is why you need nested inner symbol tables. If we assume the general practice in circt that (inner) symbol name doesn't imply user-visible name, and given that inner-symbols can be defined arbitrarily deep in region nesting, then the only thing nesting really gives you is aliasing in inner symbol values. I can already have: ST{ op[S=foo] { op[IS=bar] { op[IS=baz] } } and refer to baz uniquely and canonically with hierpaths [foo, baz]. I do have to ensure baz isn't defined in other nestings, but since the name itself shouldn't have meaning, this shouldn't be a big deal.

That's what we said in the meeting last Nov (IIRC). It just doesn't sit right with me (and even moreso with Morten). Why do you need nested symbols at all? Why not just have a flat symbol namespace and make all CIRCT symbols (to differentiate from upstream symbols) unique? This is essentially what we've been talking about with using Distinct attributes, yes? (Frankly, I think this would save a lot of complexity so I'm beginning to get on board with it.) There's something arbitrary about two-level symbols which just doesn't sit right with us.

May 10 '24 23:05 teqdruid

circt circt copied to clipboard

[HW] Rework InnerSym infra to support nested symbol tables

circt
circt copied to clipboard