circt [Moore] [Canonicalizer] Lower struct-related assignOp

Using pass canonicalizers: lower structExtractRefOp to structInjectOp Refresh referring to struct SSA when new struct is created (structInjectOp) with pass simplifyprocedure and mem2reg update union verify function Register func dialect for func.func Op

Jul 18 '24 09:07 mingzheTerapines

@fabianschuiki which is better?

  %0 = moore.constant 0 : i64
  %1 = moore.conversion %0 : !moore.i64 -> !moore.struct<{a: i32, b: i32}>
  %f = moore.variable %1 : <struct<{a: i32, b: i32}>>

a.rewrite variableOp

 %0 = moore.constant 0 : i64
 %1 = moore.conversion %0 : !moore.i64 -> !moore.struct<{a: i32, b: i32}>
 %2 = moore.struct_extract %1 , "a"
 %3 = moore.struct_extract %1 , "b"
 %4 = moore.struct_create %2, %3 : !moore.i32, !moore.i32 -> <struct<{a: i32, b: i32}>>

The problem is structExtractOp is a readOp like Op, the input should be refType value. But conversionOp returns unpackedType value. We may need a Op like addressOp opposite readOp So Ir will be like this

 %0 = moore.constant 0 : i64
 %1 = moore.conversion %0 : !moore.i64 -> !moore.struct<{a: i32, b: i32}>
%2 = moore.address %1
 %3 = moore.struct_extract %2 , "a"
%4 = moore.address %1
 %5 = moore.struct_extract %4 , "b"
 %6 = moore.struct_create %3, %5 : !moore.i32, !moore.i32 -> <struct<{a: i32, b: i32}>>

b. rewrite conversionOp and variableOp

  %0 = moore.constant 0 : i64
  %1 = moore.constant 0 : i32
  %2 = moore.extract %0 from %1 
  %3 = moore.constant 32 : i32
  %4 = moore.extract %0 from %3 
  %5  = moore.concat %2, %4
  %f = moore.struct_create %5 : <struct<{a: i32, b: i32}>>

Then

  %0 = moore.constant 0 : i64
  %1 = moore.constant 0 : i32
  %2 = moore.extract %0 from %1 
  %3 = moore.constant 32 : i32
  %4 = moore.extract %0 from %3 
  %f = moore.struct_create %2, %4 : <struct<{a: i32, b: i32}>>

Jul 19 '24 02:07 mingzheTerapines

Hmmm I don't think struct_create can return a ref type. It simply creates a struct<{...}> SSA value, but it does not allocate any storage like variable. Same with struct_extract: it operates directly on struct<{...}> values, not ref types. The only operation that goes from ref<T> to T is moore.read.

Jul 19 '24 17:07 fabianschuiki

Hmmm I don't think struct_create can return a ref type. It simply creates a struct<{...}> SSA value, but it does not allocate any storage like variable. Same with struct_extract: it operates directly on struct<{...}> values, not ref types. The only operation that goes from ref<T> to T is moore.read.

In Moore, as I understand, RefType means pointer that does not equally allocate memory.

// CHECK-LABEL: moore.module @structCreateFold
moore.module @structCreateFold(in %a : !moore.i1, out b : !moore.i1) {
    %0 = moore.struct_create %a : !moore.i1 -> struct<{a: i1}>
    %1 = moore.struct_extract %0, "a" : <struct<{a: i1}>> -> i1
    // CHECK:  moore.output %a : !moore.i1
    moore.output %1 : !moore.i1
  }

For Ops like these, I can not read field %a if %0 is not a refType. Unless, we have Op like addressOp I mentioned.

Jul 22 '24 01:07 mingzheTerapines

// CHECK-LABEL: moore.module @structCreateFold
moore.module @structCreateFold(in %a : !moore.i1, out b : !moore.i1) {
    %0 = moore.struct_create %a : !moore.i1 -> struct<{a: i1}>
    %1 = moore.struct_extract %0, "a" : <struct<{a: i1}>> -> i1
    // CHECK:  moore.output %a : !moore.i1
    moore.output %1 : !moore.i1
  }

For Ops like these, I can not read field %a if %0 is not a refType. Unless, we have Op like addressOp I mentioned.

You should be able to get field "a" as follows:

%0 = moore.struct_create %a : !moore.i1 -> !moore.struct<{a: i1}>
%1 = moore.struct_extract %0, "a" : !moore.struct<{a: i1}> -> i1  // this operates on struct, not ref<struct>
moore.output %1 : !moore.i1

The struct_extract op should operate on !moore.struct type. There should be a separate struct_extract_ref op if you need to go from a ref<struct<{a: T}>> to a ref<T>.

Jul 22 '24 18:07 fabianschuiki

@fabianschuiki structExtractOp reads sub memory slot relates to as readOp in Moore dialect and LoadOp in memrefOp. So it should operate on ref<structType>.

def LoadOp : MemRef_Op
let arguments = (ins Arg<AnyMemRef, "the reference to load from",
                           [MemRead]>:$memref,
                       Variadic<Index>:$indices,
                       DefaultValuedOptionalAttr<BoolAttr, "false">:$nontemporal);

Jul 23 '24 01:07 mingzheTerapines

@fabianschuiki which is better?

  %0 = moore.constant 0 : i64
  %1 = moore.conversion %0 : !moore.i64 -> !moore.struct<{a: i32, b: i32}>
  %f = moore.variable %1 : <struct<{a: i32, b: i32}>>

a.rewrite variableOp

 %0 = moore.constant 0 : i64
 %1 = moore.conversion %0 : !moore.i64 -> !moore.struct<{a: i32, b: i32}>
 %2 = moore.struct_extract %1 , "a"
 %3 = moore.struct_extract %1 , "b"
 %4 = moore.struct_create %2, %3 : !moore.i32, !moore.i32 -> <struct<{a: i32, b: i32}>>

The problem is structExtractOp is a readOp like Op, the input should be refType value. But conversionOp returns unpackedType value. We may need a Op like addressOp opposite readOp So Ir will be like this

 %0 = moore.constant 0 : i64
 %1 = moore.conversion %0 : !moore.i64 -> !moore.struct<{a: i32, b: i32}>
%2 = moore.address %1
 %3 = moore.struct_extract %2 , "a"
%4 = moore.address %1
 %5 = moore.struct_extract %4 , "b"
 %6 = moore.struct_create %3, %5 : !moore.i32, !moore.i32 -> <struct<{a: i32, b: i32}>>

b. rewrite conversionOp and variableOp

  %0 = moore.constant 0 : i64
  %1 = moore.constant 0 : i32
  %2 = moore.extract %0 from %1 
  %3 = moore.constant 32 : i32
  %4 = moore.extract %0 from %3 
  %5  = moore.concat %2, %4
  %f = moore.struct_create %5 : <struct<{a: i32, b: i32}>>

Then

  %0 = moore.constant 0 : i64
  %1 = moore.constant 0 : i32
  %2 = moore.extract %0 from %1 
  %3 = moore.constant 32 : i32
  %4 = moore.extract %0 from %3 
  %f = moore.struct_create %2, %4 : <struct<{a: i32, b: i32}>>

@uenoku You are right, it was outdated. Please see these demo. Do you think we need addressOp opposite to readOp in Moore dialect?

Jul 24 '24 05:07 mingzheTerapines

Hmmm I don't think struct_create can return a ref type. It simply creates a struct<{...}> SSA value, but it does not allocate any storage like variable. Same with struct_extract: it operates directly on struct<{...}> values, not ref types. The only operation that goes from ref<T> to T is moore.read.

In Moore, as I understand, RefType means pointer that does not equally allocate memory. ... For Ops like these, I can not read field %a if %0 is not a refType. Unless, we have Op like addressOp I mentioned.

Hey, @mingzheTerapines. The ref<T> represents a reference to a type. We discussed this at the PR(https://github.com/llvm/circt/pull/7082). The memref.load as you mentioned, it's the reference to load from... I agree with @fabianschuiki's viewpoint, we should tweak the result type of struct_create to non-refType. The struct_extract is also.

Jul 26 '24 05:07 terapines-osc-circt

@hailongSun2000 I partly agree with you. structCreateOp can return unpacked type.

Jul 26 '24 05:07 mingzheTerapines

For the test, it actually has two steps: original

moore.module @structExtractRefLower2Inject(out a : !moore.struct<{a: i32, b: i32}>) {
  %ii = moore.variable : <struct<{a: i32, b: i32}>>
  %0 = moore.struct_extract_ref %ii, "a" : <struct<{a: i32, b: i32}>> -> <i32>
  %1 = moore.constant 1 : i32
  moore.blocking_assign %0, %1 : i32
  %2 = moore.read %ii : <struct<{a: i32, b: i32}>>
  moore.output %2 : !moore.struct<{a: i32, b: i32}>
}

Firstly rewrite patern: variableOp -> constantOp for initial field value + structCreateOp + addressOp for variableOp requiring returning RefType blockingAssignOp + structExtrcatRefOp -> structInjectOp

moore.module @structExtractRefLower2Inject(out a : !moore.struct<{a: i32, b: i32}>) {
    %0 = moore.constant 0 : i32
    %1 = moore.constant 0 : i32
    %2 = moore.struct_create %0, %1 : !moore.i32, !moore.i32 -> struct<{a: i32, b: i32}>
    %3 = moore.address %2 : <struct<{a: i32, b: i32}>>
    %4 = moore.constant 1 : i32
    %5 = moore.read %3 : <struct<{a: i32, b: i32}>>
    %6 = moore.struct_inject %5, "a", %4 : struct<{a: i32, b: i32}>, i32
    %7 = moore.address %6 : <struct<{a: i32, b: i32}>>
    %8 = moore.read %7 : <struct<{a: i32, b: i32}>>
    moore.output %8 : !moore.struct<{a: i32, b: i32}>
  }

Secondly rewrite patern: readOp ->If read addressOp, directly use addressOp's input value replacing the use of readOp structInjectOp -> Chase a chain of struct_inject ops, with an optional final struct_create, and take note of the values assigned to each field.

moore.module @structExtractRefLower2Inject(out a : !moore.struct<{a: i32, b: i32}>) {
    %0 = moore.constant 0 : i32
    %1 = moore.constant 1 : i32
    %2 = moore.struct_create %1, %0 : !moore.i32, !moore.i32 -> struct<{a: i32, b: i32}>
    moore.output %2 : !moore.struct<{a: i32, b: i32}>
  }

Jul 31 '24 03:07 mingzheTerapines

I think you can just use moore.variable instead of a new moore.address op. It looks like these would do the same thing.

The steps you are describing sound exactly like a mem2reg pass. Does MLIR's existing mem2reg pass potentially already do what you are describing? That would probably be easier than doing it by hand. Especially since you can't just directly replace variable ops with struct_create, and if there are multiple assigns to the variable in between, the pass becomes much more complicated. I'm pretty sure you can't do this with rewriting patterns alone -- there's probably quite a bit of analysis involved (like what mem2reg does).

Aug 01 '24 16:08 fabianschuiki

@Moxinilian @fabianschuiki It seems SROA Pass + Mem2Reg Pass look very similar to Canonicalize Pass. They all rewrite Ops to original value and remove unused value. What responsibility should they do? I was considering SROA+Mem2Reg for local variable, Canonicalize for global variable.

Aug 02 '24 01:08 mingzheTerapines

I think you can just use moore.variable instead of a new moore.address op. It looks like these would do the same thing.

The steps you are describing sound exactly like a mem2reg pass. Does MLIR's existing mem2reg pass potentially already do what you are describing? That would probably be easier than doing it by hand. Especially since you can't just directly replace variable ops with struct_create, and if there are multiple assigns to the variable in between, the pass becomes much more complicated. I'm pretty sure you can't do this with rewriting patterns alone -- there's probably quite a bit of analysis involved (like what mem2reg does).

But we want to lower variableOp of struct to structCreateOp, the result type is different which needs to be the same.

berfore
 %ii = moore.variable : <struct<{a: i32, b: i32}>>
//after
    %0 = moore.constant 0 : i32
    %1 = moore.constant 0 : i32
    %2 = moore.struct_create %0, %1 : !moore.i32, !moore.i32 -> struct<{a: i32, b: i32}>
    %3 =  moore.variable %2 : <struct<{a: i32, b: i32}>>

It seems not elegent

Aug 02 '24 01:08 mingzheTerapines

Mem2Reg is most useful when you have branching control flow or interaction between dialects, hence why its effect is conceptually simple here. I don't think it makes sense to use canonicalization to do what it does as it's not very local, it requires tracking many effects (notably, memory side-effects).

I'm still not sure what you are trying to achieve with global variables. How would you want to eliminate a global variable?

Aug 02 '24 11:08 Moxinilian

I agree with @Moxinilian: I don't think you can use a canonicalization to replicate a mem2reg pass. Variables can be read and written from many different procedures and module-level operations at the same time, and it will not be possible to find a simple value substitution for all of those situations. It's important to not treat a moore.module body as a program that runs from top to bottom: the module has no control flow, so you cannot reason about moore.assign as taking a current value of a variable and updating it to a different value.

I think what you are trying to do is eliminate as many variables in the module and replace them with an SSA value directly. That would be a great thing to do! The moore.assigned_variable op is a great starting point for that: if a variable has only a single moore.assign, you can merge the variable and assigned value into moore.assigned_variable (note that this is different from the initial value of a moore.variable). I think this is already implemented as a canonicalization.

Another canonicalization you could do is to try and pull assigns to struct_extract_ref up through the extract:

// before:
%3 = moore.struct_extract_ref %0, "a"
moore.assign %3, %1
%4 = moore.struct_extract_ref %0, "b"
moore.assign %4, %2

// after:
%3 = moore.struct_create %1, %2
moore.assign %1, %3

Aug 02 '24 16:08 fabianschuiki

@fabianschuiki I just tested new version of circt-verilog, I found I may not need to lower to structCreateOp for variable of struct.

module Foo(output [63:0] cout);
  struct packed {int a; int b;} ST = 'b0;
  assign cout = ST;
endmodule
// -----// IR Dump Before CSE (cse) //----- //
moore.module @Foo(out cout : !moore.l64) {
  %cout = moore.net wire : <l64>
  %0 = moore.constant 0 : i32
  %1 = moore.conversion %0 : !moore.i32 -> !moore.struct<{a: i32, b: i32}>
  %ST = moore.variable %1 : <struct<{a: i32, b: i32}>>
  %2 = moore.read %ST : <struct<{a: i32, b: i32}>>
  %3 = moore.conversion %2 : !moore.struct<{a: i32, b: i32}> -> !moore.l64
  moore.assign %cout, %3 : l64
  %4 = moore.read %cout : <l64>
  moore.output %4 : !moore.l64
}

// -----// IR Dump Before Canonicalizer (canonicalize) //----- //
moore.module @Foo(out cout : !moore.l64) {
  %cout = moore.net wire : <l64>
  %0 = moore.constant 0 : i32
  %1 = moore.conversion %0 : !moore.i32 -> !moore.struct<{a: i32, b: i32}>
  %ST = moore.variable %1 : <struct<{a: i32, b: i32}>>
  %2 = moore.read %ST : <struct<{a: i32, b: i32}>>
  %3 = moore.conversion %2 : !moore.struct<{a: i32, b: i32}> -> !moore.l64
  moore.assign %cout, %3 : l64
  %4 = moore.read %cout : <l64>
  moore.output %4 : !moore.l64
}

// -----// IR Dump Before LowerConcatRef (moore-lower-concatref) //----- //
moore.module @Foo(out cout : !moore.l64) {
  %0 = moore.constant 0 : i32
  %1 = moore.conversion %0 : !moore.i32 -> !moore.struct<{a: i32, b: i32}>
  %ST = moore.variable %1 : <struct<{a: i32, b: i32}>>
  %2 = moore.read %ST : <struct<{a: i32, b: i32}>>
  %3 = moore.conversion %2 : !moore.struct<{a: i32, b: i32}> -> !moore.l64
  moore.output %3 : !moore.l64
}

// -----// IR Dump Before SROA (sroa) //----- //
moore.module @Foo(out cout : !moore.l64) {
  %0 = moore.constant 0 : i32
  %1 = moore.conversion %0 : !moore.i32 -> !moore.struct<{a: i32, b: i32}>
  %ST = moore.variable %1 : <struct<{a: i32, b: i32}>>
  %2 = moore.read %ST : <struct<{a: i32, b: i32}>>
  %3 = moore.conversion %2 : !moore.struct<{a: i32, b: i32}> -> !moore.l64
  moore.output %3 : !moore.l64
}

// -----// IR Dump Before Mem2Reg (mem2reg) //----- //
moore.module @Foo(out cout : !moore.l64) {
  %0 = moore.constant 0 : i32
  %1 = moore.conversion %0 : !moore.i32 -> !moore.struct<{a: i32, b: i32}>
  %ST = moore.variable %1 : <struct<{a: i32, b: i32}>>
  %2 = moore.read %ST : <struct<{a: i32, b: i32}>>
  %3 = moore.conversion %2 : !moore.struct<{a: i32, b: i32}> -> !moore.l64
  moore.output %3 : !moore.l64
}

// -----// IR Dump Before CSE (cse) //----- //
moore.module @Foo(out cout : !moore.l64) {
  %0 = moore.constant 0 : i32
  %1 = moore.conversion %0 : !moore.i32 -> !moore.struct<{a: i32, b: i32}>
  %ST = moore.variable %1 : <struct<{a: i32, b: i32}>>
  %2 = moore.read %ST : <struct<{a: i32, b: i32}>>
  %3 = moore.conversion %2 : !moore.struct<{a: i32, b: i32}> -> !moore.l64
  moore.output %3 : !moore.l64
}

// -----// IR Dump Before Canonicalizer (canonicalize) //----- //
moore.module @Foo(out cout : !moore.l64) {
  %0 = moore.constant 0 : i32
  %1 = moore.conversion %0 : !moore.i32 -> !moore.struct<{a: i32, b: i32}>
  %ST = moore.variable %1 : <struct<{a: i32, b: i32}>>
  %2 = moore.read %ST : <struct<{a: i32, b: i32}>>
  %3 = moore.conversion %2 : !moore.struct<{a: i32, b: i32}> -> !moore.l64
  moore.output %3 : !moore.l64
}

// -----// IR Dump Before ConvertMooreToCore (convert-moore-to-core) //----- //
module {
  moore.module @Foo(out cout : !moore.l64) {
    %0 = moore.constant 0 : i32
    %1 = moore.conversion %0 : !moore.i32 -> !moore.struct<{a: i32, b: i32}>
    %ST = moore.variable %1 : <struct<{a: i32, b: i32}>>
    %2 = moore.read %ST : <struct<{a: i32, b: i32}>>
    %3 = moore.conversion %2 : !moore.struct<{a: i32, b: i32}> -> !moore.l64
    moore.output %3 : !moore.l64
  }
}


// -----// IR Dump Before CSE (cse) //----- //
hw.module @Foo(out cout : i64) {
  %c0_i32 = hw.constant 0 : i32
  %c0_i32_0 = hw.constant 0 : i32
  %0 = comb.concat %c0_i32_0, %c0_i32 : i32, i32
  %1 = hw.bitcast %0 : (i64) -> !hw.struct<a: i32, b: i32>
  %ST = llhd.sig %1 : !hw.struct<a: i32, b: i32>
  %2 = llhd.prb %ST : !hw.inout<struct<a: i32, b: i32>>
  %3 = hw.bitcast %2 : (!hw.struct<a: i32, b: i32>) -> i64
  hw.output %3 : i64
}

Aug 22 '24 08:08 mingzheTerapines

@mingzheTerapines I agree, this looks good! We could still improve the canonicalizer of moore.variable a bit: currently variables only get replaced with a moore.assigned_variable if there is a single assignment to the variable. In your example however we have a moore.variable that has an initial value which is never changed by an assign. This combination of moore.variable with initial assignment and no continuous assignment could also be lowered to a moore.assigned_variable 😃

Aug 22 '24 09:08 fabianschuiki

@fabianschuiki canonicalize Pass seems not as reliable as mem2reg Pass. Because it can only be used for folding structinject or structextract when struct varible itself not used between these two folding Ops. But I am not sure this kind of using check has been done. What's more some procedure region should be handled very carefully. For example

moore.module @LocalVar(out a : !moore.struct<{a: i32, b: i32}>) {
  %a = moore.variable : <struct<{a: i32, b: i32}>>
  %x = moore.variable : <i32>
  %y = moore.variable : <i32>
  %z = moore.variable : <i32>
  %ii = moore.variable : <struct<{a: i32, b: i32}>>
  moore.procedure always_comb {
    %1 = moore.struct_extract_ref %ii, "a" : <struct<{a: i32, b: i32}>> -> <i32>
    %2 = moore.constant 1 : i32
    moore.blocking_assign %1, %2 : i32
    %3 = moore.struct_extract_ref %ii, "b" : <struct<{a: i32, b: i32}>> -> <i32>
    %4 = moore.constant 4 : i32
    moore.blocking_assign %3, %4 : i32
    moore.return
  }
  %0 = moore.read %a : <struct<{a: i32, b: i32}>>
  moore.output %0 : !moore.struct<{a: i32, b: i32}>
}

We can not canonicalize to

moore.module @LocalVar(out a : !moore.struct<{a: i32, b: i32}>) {
  %a = moore.variable : <struct<{a: i32, b: i32}>>
  %x = moore.variable : <i32>
  %y = moore.variable : <i32>
  %z = moore.variable : <i32>
  moore.procedure always_comb {
  %ii = moore.struct_create %2, %3
    moore.return
  }
  %0 = moore.read %a : <struct<{a: i32, b: i32}>>
  moore.output %0 : !moore.struct<{a: i32, b: i32}>
}

Aug 23 '24 02:08 mingzheTerapines

@fabianschuiki As you known, there are some problems with --moore-simplify-procedures, so it can not be enabled. But it works making sturctinjectOp be chain use like. I tested local successfully. Here is the flow importverilog

moore.module @LocalVar(out a : !moore.struct<{a: i32, b: i32}>) {
  %a = moore.variable : <struct<{a: i32, b: i32}>>
  %x = moore.variable : <i32>
  %y = moore.variable : <i32>
  %z = moore.variable : <i32>
  %ii = moore.variable : <struct<{a: i32, b: i32}>>
   moore.procedure always_comb {
    %1 = moore.struct_extract_ref %ii, "a" : <struct<{a: i32, b: i32}>> -> <i32>
    %2 = moore.constant 1 : i32
    moore.blocking_assign %1, %2 : i32
    %3 = moore.struct_extract_ref %ii, "b" : <struct<{a: i32, b: i32}>> -> <i32>
    %4 = moore.constant 4 : i32
    moore.blocking_assign %3, %4 : i32
     moore.return
  }
  %0 = moore.read %a : <struct<{a: i32, b: i32}>>
  moore.output %0 : !moore.struct<{a: i32, b: i32}>
}

canonicalizer

module {
  moore.module @LocalVar(out a : !moore.struct<{a: i32, b: i32}>) {
    %0 = moore.constant 4 : i32
    %1 = moore.constant 1 : i32
    %a = moore.variable : <struct<{a: i32, b: i32}>>
    %ii = moore.variable : <struct<{a: i32, b: i32}>>
    moore.procedure always_comb {
      %3 = moore.read %ii : <struct<{a: i32, b: i32}>>
      %4 = moore.struct_inject %3, "a", %1 : struct<{a: i32, b: i32}>, i32
      moore.blocking_assign %ii, %4 : struct<{a: i32, b: i32}>
      %5 = moore.read %ii : <struct<{a: i32, b: i32}>>
      %6 = moore.struct_inject %5, "b", %0 : struct<{a: i32, b: i32}>, i32
      moore.blocking_assign %ii, %6 : struct<{a: i32, b: i32}>
      moore.return
    }
    %2 = moore.read %a : <struct<{a: i32, b: i32}>>
    moore.output %2 : !moore.struct<{a: i32, b: i32}>
  }
}

--moore-simplify-procedures

module {
  moore.module @LocalVar(out a : !moore.struct<{a: i32, b: i32}>) {
    %0 = moore.constant 4 : i32
    %1 = moore.constant 1 : i32
    %a = moore.variable : <struct<{a: i32, b: i32}>>
    %ii = moore.variable : <struct<{a: i32, b: i32}>>
    moore.procedure always_comb {
      %3 = moore.read %ii : <struct<{a: i32, b: i32}>>
      %4 = moore.variable : <struct<{a: i32, b: i32}>>
      moore.blocking_assign %4, %3 : struct<{a: i32, b: i32}>
      %5 = moore.read %4 : <struct<{a: i32, b: i32}>>
      %6 = moore.struct_inject %5, "a", %1 : struct<{a: i32, b: i32}>, i32
      moore.blocking_assign %4, %6 : struct<{a: i32, b: i32}>
      %7 = moore.read %4 : <struct<{a: i32, b: i32}>>
      moore.blocking_assign %ii, %7 : struct<{a: i32, b: i32}>
      %8 = moore.read %4 : <struct<{a: i32, b: i32}>>
      %9 = moore.struct_inject %8, "b", %0 : struct<{a: i32, b: i32}>, i32
      moore.blocking_assign %4, %9 : struct<{a: i32, b: i32}>
      %10 = moore.read %4 : <struct<{a: i32, b: i32}>>
      moore.blocking_assign %ii, %10 : struct<{a: i32, b: i32}>
      moore.return
    }
    %2 = moore.read %a : <struct<{a: i32, b: i32}>>
    moore.output %2 : !moore.struct<{a: i32, b: i32}>
  }
}

mem2reg

module {
  moore.module @LocalVar(out a : !moore.struct<{a: i32, b: i32}>) {
    %0 = moore.constant 4 : i32
    %1 = moore.constant 1 : i32
    %a = moore.variable : <struct<{a: i32, b: i32}>>
    %ii = moore.variable : <struct<{a: i32, b: i32}>>
    moore.procedure always_comb {
      %3 = moore.constant 0 : i64
      %4 = moore.conversion %3 : !moore.i64 -> !moore.struct<{a: i32, b: i32}>
      %5 = moore.read %ii : <struct<{a: i32, b: i32}>>
      %6 = moore.struct_inject %5, "a", %1 : struct<{a: i32, b: i32}>, i32
      moore.blocking_assign %ii, %6 : struct<{a: i32, b: i32}>
      %7 = moore.struct_inject %6, "b", %0 : struct<{a: i32, b: i32}>, i32
      moore.blocking_assign %ii, %7 : struct<{a: i32, b: i32}>
      moore.return
    }
    %2 = moore.read %a : <struct<{a: i32, b: i32}>>
    moore.output %2 : !moore.struct<{a: i32, b: i32}>
  }
}

Aug 23 '24 06:08 mingzheTerapines

@fabianschuiki Could you please give some advice for this commit, thanks a lot.

Sep 24 '24 02:09 mingzheTerapines

@mingzheTerapines I'm wondering if the SROA you have added to the Moore dialect doesn't more or less do a similar thing than this PR :thinking:. If your goal is to get rid of subfield assignments to structs and arrays, SROA feels like a pretty good approach because it eliminates the struct altogether.

Converting assignments to struct fields into assignments to the entire struct is a lot more complicated. The difficult part is that you have to make sure that assigning to the entire struct doesn't overwrite other assignments to individual fields. For example, you could have the following:

struct packed {
  int a;
  int b;
} x;
initial x.a <= #1ns 42;
initial x.b <= #2ns 9001;

This would show up like the following in the IR:

%x = moore.variable
moore.procedure initial {
  %x.a = moore.struct_extract_ref %x, "a"
  moore.nonblocking_assign %x.a, 42 after 1ns
}
moore.procedure initial {
  %x.b = moore.struct_extract_ref %x, "b"
  moore.nonblocking_assign %x.b, 9001 after 2ns
}

Here you'd be tempted to try and convert the assignment to the struct fields into assignments of the entire struct, like the following:

moore.procedure initial {
  %0 = moore.read %x
  %1 = moore.struct_inject %0, "a", 42
  moore.nonblocking_assign %x, %1 after 1ns
}
moore.procedure initial {
  %0 = moore.read %x
  %1 = moore.struct_inject %0, "b", 9001
  moore.nonblocking_assign %x, %1 after 2ns
}

Unfortunately this wouldn't work properly, since the two procedures would read the initial value of x, {0, 0}, compute an updated value, {42, 0} and {0, 9001}, and then schedule an assignment at 1ns and 2ns. This means that x would have the following values:

0ns  x = {0, 0}
1ns  x = {42, 0}
2ns  x = {0, 9001}  // correct would be {42, 9001}

I suspect that there will be a lot of other examples where you would get a similar kind of misbehavior by simply overwriting an entire struct's value where you originally only accessed an individual field.

There are a few ways to make this work though:

You could rely on the subtle semantics of packed structs which, if I recall correctly, guarantee that even when you write to just a field, the write will update the entire struct in the background. You could only do your conversion if the struct or array is packed.
You could ensure that the struct or array you are trying to optimize is only accessed by the current procedure, and that there is absolutely no way for other references to the same struct to exists (no aliasing). This would be the case for example if the struct/array ref you're dealing with comes directly from a moore.variable (not a block argument or some other op), and that all uses of the variable are in the current procedure, and there is no time/event control that could suspend execution (or alternatively that all uses are in the same block and there are no side-effecting ops in between them). If this is the case, you are sure that you see every assignment to the struct, and then you can perform any kind of merging that you want :smiley:.
You could treat module-level moore.assigns to variables differently. These should be simpler to deal with: if there is no aliasing (again, the struct/array ref does not escape through an output port, or is not assigned in any procedure) and every field of the struct has a single unique assign to it, then you can convert everything into an assign for the entire struct. This should actually be handled more or less by the mem2reg pass already. This would be a very nice optimization to have! :partying_face:

In any case, since this is a pretty complicated optimization that requires analysis across an entire module, this would have to be a separate transformation pass instead of a canonicalizer.

Sep 24 '24 17:09 fabianschuiki

@fabianschuiki Yes I found you mentioned in https://github.com/llvm/circt/pull/7158#issuecomment-2218422232

 Pretty sure moore.struct_inject doesn't really work for module-level variables, since the assignments in a module all occur in parallel, so there is no order in the sense of old struct before the assignment and new struct after the assignment.

So I let moore.struct_inject works in moore.blocking_assign canonicalize pass, it make sense. Also I remembered you mentioned reading as a struct and writing would be faster than reading as several parts and writing. So personally I think moore.struct_inject is still efficient for moore.blocking_assign. SROA could work for parallel case.

Sep 25 '24 01:09 mingzheTerapines

I like the idea of using struct_inject for blocking assigns in procedures, for variables declared in the procedure! :smiley: That way you don't have to reason about module-level variables that may have other reads/assigns that you cannot easily control.

Mem2reg should offer a way to generate struct injects for assignments through subfield accesses. I'm not entirely sure how that works; you might have to declare a memory slot for every field in a struct/array?

Sep 25 '24 03:09 fabianschuiki

circt circt copied to clipboard

[Moore] [Canonicalizer] Lower struct-related assignOp

circt
circt copied to clipboard