circt [MooreToCore] Lower moore.variable into hw.wire.

moore.net is not considered for the time being. And just lower the variables of two-state and four-state(have initial value) into hw.wire. If the variables don't have an initial value, assign zero to it. After this, when meeting moore.assign(continuous assignment), will update the operand of the variables.

May 22 '24 10:05 terapines-osc-circt

How are you planning to deal with assignments to variables? Is it possible that after Mem2Reg or your variable removal pass (#6990) there are no more variables?

Assuming you are specifically dealing with module-level variables outside of procedures, maybe your conversion would also have to look for any moore.assigns to the variable and use the assigned value for the hw.wire operand?

May 22 '24 22:05 fabianschuiki

The draft (#6990) is just to eliminate local variables(procedures' body), the global variables(module-level) don't be handled. So in my thought, the local variables don't exist when we execute MooreToCore, they will be eliminated after executing Mem2Reg or (#6990).

That's right! I am planning to use the assigned value for the hw.wire operand when looking for any moore.assign. For example: from

%a = moore.variable : !moore.i32
%0 = moore.constant 1 : !moore.i32
%b = moore.variable : !moore.i32

moore.assign %a %0(or %b) : !moore.i32

to

%c0_i32 = hw.constant 0 : i32
%a = hw.wire %c0_i32 : i32

%c1_i32 = hw.constant 1 : i32

%c0_i32_0 = hw.constant 0 : i32
%b = hw.wire %c0_i32_0 : i32

After meeting moore.assign %a, %0(or %b) : !moore.i32

%c0_i32 = hw.constant 0 : i32

%a = hw.wire %c1_i32(or %b) : i32

%c1_i32 = hw.constant 1 : i32

%c0_i32_0 = hw.constant 0 : i32
%b = hw.wire %c0_i32_0 : i32

May 23 '24 01:05 terapines-osc-circt

Sorry for the late reply. I just spent the Dragon Boat Festival. I viewed the llhd.sig, it's similar to moore. variable and moore.net. And I noticed the llhd. var and other memory ops. So a kind feeling! :smile: In my thought, you mentioned dynamically-indexed arrays like arr[x] = y which is so cool! I have a question about moore.variable and moore.assign. As you said, merge them into one new op like %a = moore.assigned_variable %0, the original ops moore.variable and moore.assign whether exist?

For the registers, I agree with you that we need a dedicated pass. It's so unreasonable to analyze IRs in MooreToCore. I have a branch that handles sequential things like always@(posedge clk). There are many codes for analysis.

Jun 11 '24 02:06 terapines-osc-circt

I think we would still keep moore.variable and moore.assign in the IR, but we could create a canonicalization pattern or a separate simplification pass that finds "easy" uses of variable and assign and merges them into assigned_variable. That allows the MooreToCore lowering to look for assigned_variable to create an hw.wire, and for variable and assign it could generate llhd.sig and llhd.drv :smiley:

Jun 12 '24 17:06 fabianschuiki

Could we directly lower moore.vaiable into llhd.sig and moore.assign into llhd.drv, rather than lower a partial of them into moore.assigned_moore? And I noticed llhd.con, which looks more like continuous assignments. And I think whether we need to extend assign ops with a delay control like llhd.drv.

Jun 13 '24 06:06 terapines-osc-circt

Could we directly lower moore.vaiable into llhd.sig and moore.assign into llhd.drv, rather than lower a partial of them into moore.assigned_moore?

Lowering of moore.{variable,assign} to llhd.{sig,drv} should always be correct. However, it would be nice if the passes in the Moore dialect could already detect if certain variables follow a simple use model and then lower them to hw.wire instead of the LLHD dialect ops. Ideally that would allow us to not emit any LLHD ops for synthesizable Verilog inputs. LLHD is going to be slower to simulate as well because it interacts with an event queue. Therefore it would be nice to already detect obvious cases in the Moore dialect where we can lower to HW/Comb/Seq instead of LLHD.

And I noticed llhd.con, which looks more like continuous assignments. And I think whether we need to extend assign ops with a delay control like llhd.drv.

Yes I think you are right. The nonblocking assignments can carry a delay on the connect itself, as in x <= #1ns y. That would be great to capture on the moore.nonblocking_assign op 🙂.

Jun 13 '24 17:06 fabianschuiki

I learn it.

Jun 14 '24 01:06 terapines-osc-circt

When I merge a declaration and the corresponding assignment into a new op called assigned_variable. I'll suffer from dominance error. For example:

moore.module @top() {
  %a = moore.variable : <i32>
  %w = moore.net wire : <l1>
  %0 = moore.constant 32 : i32
  moore.assign %a, %0 : i32
  %1 = moore.constant true : i1
  %2 = moore.conversion %1 : !moore.i1 -> !moore.l1
  moore.assign %w, %2 : l1
  moore.output
}

I think after executing MergeAssignments pass, the IR becomes:

moore.module @top() {
  %a = moore.variable : <i32>

  %a_0 = moore.assigned_variable %0 : <i32>

  %w = moore.net wire : <l1>
  
  %w_0 = moore.assigned_variable %2 : <l1>
   
  %0 = moore.constant 32 : i32
  moore.assign %a, %0 : i32
  %1 = moore.constant true : i1
  %2 = moore.conversion %1 : !moore.i1 -> !moore.l1
  moore.assign %w, %2 : l1
  moore.output
}

However, the error operand #0 does not dominate this use int a; will be thrown due to the SSACFG region of SVModule. So I have to create assigned_variable after its operand. Like:

%0 = moore.constant 32 : i32
%a_0 = moore.assigned_variable %0 : <i32>
...
%2 = moore.conversion %1 : !moore.i1 -> !moore.l1
%w_0 = moore.assigned_variable %2 : <l1>

And keep the origin %a and %w whether they are a little unsuitable. Because we need to update the define-use chain by replacing %a with %a_0. In other words, assigned_variable is not suitable for the SSACFG region, right?

Jun 27 '24 06:06 terapines-osc-circt

Oh that's a great catch! The SVModuleOp is currently an SSACFG region, which means SSA dominance rules apply and operations producing a value have to appear before any operations that use the value. This works in the plain SV case because there is a separation between declaring a wire/variable and then assigning a value to it.

However, since we want to transform the Moore IR from that declaration-assignment style into something that more closely resembles the HW dialect and hardware netlists, it would make sense to make SVModuleOp have a graph region instead. (Same as hw.module.) This will remove the dominance condition and allow you to have operations use values before the operations that define them.

Take a look at HWModuleOp and how it implements the RegionKindInterface. You should be able to do the same thing for SVModuleOp and turn it into a graph region. Then we'll be able to do these more aggressive optimizations like your assigned_variable op, which would be fantastic! 🥳

Jun 27 '24 19:06 fabianschuiki

I noticed that HWModule has RegionKindInterface, I'll add this trait for SVModule later.

Jun 28 '24 03:06 terapines-osc-circt

Hey, @fabianschuiki! Assignments like

reg [7:0] q;
always@(posedge clk) begin
  if(rst)
    q[7:5] <= 3'd0;
  else if(cond1)
    q[7:5] <= 3'b101;
  else if(cond2)
   q[7:5] <= 3'b010;
  else
   q[7:5] <= 3'b111;
   
   q[4:2] <= 3'b111;
end
assign q[1:0] = 2'd0;

How do we lower this into moore.register? Or lower them into llhd.proc?

Jul 02 '24 09:07 terapines-osc-circt

I saw a complex case:

 integer i;
    always @(posedge HCLK or negedge HRESETn) begin
        if (~HRESETn) begin
            r_gpio_inten    <= 'b0;
            r_gpio_inttype0 <= 'b0;
            r_gpio_inttype1 <= 'b0;
            r_gpio_out      <= 'b0;
            r_gpio_dir      <= 'b0;
            r_iofcfg        <= 'b0;

            for (i = 0; i < 32; i = i + 1)
            	gpio_padcfg[i * 6+:6] <= 6'b000010;
        end else if ((PSEL && PENABLE) && PWRITE) begin
            case (s_apb_addr)
            	`REG_PADDIR:   r_gpio_dir      <= PWDATA;
            	`REG_PADOUT:   r_gpio_out      <= PWDATA;
            	`REG_INTEN:    r_gpio_inten    <= PWDATA;
            	`REG_INTTYPE0: r_gpio_inttype0 <= PWDATA;
            	`REG_INTTYPE1: r_gpio_inttype1 <= PWDATA;
            	`REG_IOFCFG:   r_iofcfg        <= PWDATA;
            	`REG_PADCFG0: begin
            	    gpio_padcfg[0+:6]  <= PWDATA[5:0];
            	    gpio_padcfg[6+:6]  <= PWDATA[13:8];
            	    gpio_padcfg[12+:6] <= PWDATA[21:16];
            	    gpio_padcfg[18+:6] <= PWDATA[29:24];
            	end
            	`REG_PADCFG1: begin
            	    gpio_padcfg[24+:6] <= PWDATA[5:0];
            	    gpio_padcfg[30+:6] <= PWDATA[13:8];
            	    gpio_padcfg[36+:6] <= PWDATA[21:16];
            	    gpio_padcfg[42+:6] <= PWDATA[29:24];
            	end
            	`REG_PADCFG2: begin
            	    gpio_padcfg[48+:6] <= PWDATA[5:0];
            	    gpio_padcfg[54+:6] <= PWDATA[13:8];
            	    gpio_padcfg[60+:6] <= PWDATA[21:16];
            	    gpio_padcfg[66+:6] <= PWDATA[29:24];
            	end
            	`REG_PADCFG3: begin
            	    gpio_padcfg[72+:6] <= PWDATA[5:0];
            	    gpio_padcfg[78+:6] <= PWDATA[13:8];
            	    gpio_padcfg[84+:6] <= PWDATA[21:16];
            	    gpio_padcfg[90+:6] <= PWDATA[29:24];
            	end
            	`REG_PADCFG4: begin
            	    gpio_padcfg[96+:6] <= PWDATA[5:0];
            	    gpio_padcfg[102+:6] <= PWDATA[13:8];
            	    gpio_padcfg[108+:6] <= PWDATA[21:16];
            	    gpio_padcfg[114+:6] <= PWDATA[29:24];
            	end
            	`REG_PADCFG5: begin
            	    gpio_padcfg[120+:6] <= PWDATA[5:0];
            	    gpio_padcfg[126+:6] <= PWDATA[13:8];
            	    gpio_padcfg[132+:6] <= PWDATA[21:16];
            	    gpio_padcfg[138+:6] <= PWDATA[29:24];
            	end
            	`REG_PADCFG6: begin
            	    gpio_padcfg[144+:6] <= PWDATA[5:0];
            	    gpio_padcfg[150+:6] <= PWDATA[13:8];
            	    gpio_padcfg[156+:6] <= PWDATA[21:16];
            	    gpio_padcfg[162+:6] <= PWDATA[29:24];
            	end
            	`REG_PADCFG7: begin
            	    gpio_padcfg[168+:6] <= PWDATA[5:0];
            	    gpio_padcfg[174+:6] <= PWDATA[13:8];
            	    gpio_padcfg[180+:6] <= PWDATA[21:16];
            	    gpio_padcfg[186+:6] <= PWDATA[29:24];
            	end
            endcase
	end
    end

My thought is firstly lowering scf.if(case will be translate into scf.if) into mux op like comb::mux, in this case, we can get four mux ops. Then we need to create four new moore.register. The four mux ops will separately as the input of the corresponding to moore.register. Like

%mux1 = moore.mux...
%mux2 = moore.mux...
%mux3 = moore.mux...
%mux4 = moore.mux...

%tmp1 = moore.register %clk %mux1 ...
%tmp2 = moore.register %clk %mux2 ...
%tmp3 = moore.register %clk %mux3 ...
%tmp4 = moore.register %clk %mux4 ...

However, I'm not sure how to introduce the name of the moore.register. I think use gpio_padcfg as the moore.register's name is not reasonable. What do @fabianschuiki think?

Jul 02 '24 10:07 terapines-osc-circt

I think maybe we need a new pass to merge the extract_refs, but for the above-mentioned case, I feel it's tricky to merge them. I have a primary thought. We need a container like ScopeHashTable due to if-else region and use it to collect the extract_ref. We need to get the declaration corresponding to extract_ref, like

REG_PADCFG0: begin
            	    gpio_padcfg[0+:6]  <= PWDATA[5:0];
            	    gpio_padcfg[6+:6]  <= PWDATA[13:8];
            	    gpio_padcfg[12+:6] <= PWDATA[21:16];
          	    gpio_padcfg[18+:6] <= PWDATA[29:24];
end

We use extract_ref to slice gpio_padcfg[24:192] from the declaration(gpio_padcfg), and then use concat_ref to concat them together. Meanwhile, we need another method like MergeConnect pass in the FIRRTL to integrate them. But there is another problem how to guarantee gpio_padcfg has the correct value, we need to concat PWDATAs and the initial value of gpio_padcfg[24:192] in-order.

By the way, I think maybe we need to add the extra method for an array to get its direction. Which is [7:0] or [0:7], I think this is necessary. View the FIXME in this PR(https://github.com/llvm/circt/pull/7216), we can not estimate if it's [7:0] or [0:7].

Please let me know if my expression is confused.

Jul 03 '24 03:07 terapines-osc-circt

@hailongSun2000 Same as struct type, we can handle like this

REG_PADCFG0: begin
            	    gpio_padcfg[0+:6]  <= PWDATA[5:0];
            	    gpio_padcfg[6+:6]  <= PWDATA[13:8];
            	    gpio_padcfg[12+:6] <= PWDATA[21:16];
          	    gpio_padcfg[18+:6] <= PWDATA[29:24];
end

SROA

 %0 = moore.extract_ref "%gpio_padcfg" , "0" : i4;
 %1 = moore.extract_ref "%gpio_padcfg" , "1" : i4;
 %3 = moore.extract_ref "%gpio_padcfg" , "2" : i4;
 %3 = moore.extract_ref "%gpio_padcfg" , "3" : i4;
 ...
 hw.ArrayCreateOp %0, %1, %2... : i4
%30 =  moore.extract_ref "%PWDATA" , "0" : i4;
%31 =  moore.extract_ref "%PWDATA" , "1" : i4;
%32 =  moore.extract_ref "%PWDATA" , "2" : i4;
%33 =  moore.extract_ref "%PWDATA" , "3" : i4;
hw.ArrayCreateOp %30, %31, %32... : i4
...

REG_PADCFG0: begin 
  %50 = moore.read %30
  moore.blockingassign %0, %30
  %51 = moore.read %30
  moore.blockingassign %1, %31
  %52 = moore.read %32
  moore.blockingassign %2, %32
 ...
end

Jul 05 '24 08:07 mingzheTerapines

Great idea! Let me deliberate.

Jul 05 '24 08:07 terapines-osc-circt

@hailongSun2000 Thanks for digging up that complex example. That is an interesting and challenging set of registers to be extracted. The LLHD lowering should be simple and straightforward, because we can more or less translate the Verilog into an equivalent LLHD process. So we always have this available as a "fallback" solution when we can't detect/extract registers.

It will probably be very challenging to write a single pass that looks at this always procedure and extracts all the registers correctly. Instead, we should probably divide the problem into smaller, simpler steps towards that goal.

One thing that makes this example challenging, is the fact that the <= assignments are distributed throughout the control flow, nested inside if, case, and loop statements. The problem would be a lot simpler if we had a canonicalization that would combine all assignments to the same variable into a single (possibly conditional?) assignment at the end of the procedure. Something like the following:

// initially
moore.procedure {
  scf.if not(%HRESETn) {
    moore.nonblocking_assign %r_iofcfg, 0
    scf.for %i from 0 to 2 {
      %0 = moore.extract_ref %gpio_padcfg, %i
      moore.nonblocking_assign %0, 2
    }
  } else {
    scf.if eq(%s_apb_addr, %REG_IOFCFG) {
      moore.nonblocking_assign %r_iofcfg, %PWDATA
    }
    scf.if eq(%s_apb_addr, %REG_PADCFG0) {
      %1 = moore.extract_ref %gpio_padcfg, 0
      moore.nonblocking_assign %1, %PWDATA
    }
    scf.if eq(%s_apb_addr, %REG_PADCFG1) {
      %2 = moore.extract_ref %gpio_padcfg, 1
      moore.nonblocking_assign %2, %PWDATA
    }
  }
}

// unroll loops that contain non-blocking variable assignments
moore.procedure {
  scf.if not(%HRESETn) {
    moore.nonblocking_assign %r_iofcfg, 0
    // loop unrolled:
    %0 = moore.extract_ref %gpio_padcfg, 0
    moore.nonblocking_assign %0, 2
    %1 = moore.extract_ref %gpio_padcfg, 1
    moore.nonblocking_assign %1, 2
  } else {
    scf.if eq(%s_apb_addr, %REG_IOFCFG) {
      moore.nonblocking_assign %r_iofcfg, %PWDATA
    }
    scf.if eq(%s_apb_addr, %REG_PADCFG0) {
      %2 = moore.extract_ref %gpio_padcfg, 0
      moore.nonblocking_assign %2, %PWDATA
    }
    scf.if eq(%s_apb_addr, %REG_PADCFG1) {
      %3 = moore.extract_ref %gpio_padcfg, 1
      moore.nonblocking_assign %3, %PWDATA
    }
  }
}

// pull non-blocking assignments upwards (iteration 0)
moore.procedure {
  scf.if not(%HRESETn) {
    moore.nonblocking_assign %r_iofcfg, 0
    %0 = moore.extract_ref %gpio_padcfg, 0
    moore.nonblocking_assign %0, 2
    %1 = moore.extract_ref %gpio_padcfg, 1
    moore.nonblocking_assign %1, 2
  } else {
    // if condition folded into a condition on the non-blocking assign
    moore.nonblocking_assign %r_iofcfg, %PWDATA if eq(%s_apb_addr, %REG_IOFCFG)
    %2 = moore.extract_ref %gpio_padcfg, 0
    moore.nonblocking_assign %2, %PWDATA if eq(%s_apb_addr, %REG_PADCFG0)
    %3 = moore.extract_ref %gpio_padcfg, 1
    moore.nonblocking_assign %3, %PWDATA if eq(%s_apb_addr, %REG_PADCFG1)
  }
}

// pull non-blocking assignments upwards (iteration 1)
moore.procedure {
  // if condition folded into a condition on the non-blocking assign
  moore.nonblocking_assign %r_iofcfg, 0 if not(%HRESETn)
  %0 = moore.extract_ref %gpio_padcfg, 0
  moore.nonblocking_assign %0, 2 if not(%HRESETn)
  %1 = moore.extract_ref %gpio_padcfg, 1
  moore.nonblocking_assign %1, 2 if not(%HRESETn)
  // if condition inverted (because of else) and AND-ed into assign condition
  moore.nonblocking_assign %r_iofcfg, %PWDATA if and(eq(%s_apb_addr, %REG_IOFCFG), %HRESETn)
  %2 = moore.extract_ref %gpio_padcfg, 0
  moore.nonblocking_assign %2, %PWDATA if and(eq(%s_apb_addr, %REG_PADCFG0), %HRESETn)
  %3 = moore.extract_ref %gpio_padcfg, 1
  moore.nonblocking_assign %3, %PWDATA if and(eq(%s_apb_addr, %REG_PADCFG1), %HRESETn)
}

// CSE
moore.procedure {
  %0 = moore.extract_ref %gpio_padcfg, 0
  %1 = moore.extract_ref %gpio_padcfg, 1
  moore.nonblocking_assign %r_iofcfg, 0 if not(%HRESETn)
  moore.nonblocking_assign %r_iofcfg, %PWDATA if and(eq(%s_apb_addr, %REG_IOFCFG), %HRESETn)
  moore.nonblocking_assign %0, 2 if not(%HRESETn)
  moore.nonblocking_assign %0, %PWDATA if and(eq(%s_apb_addr, %REG_PADCFG0), %HRESETn)
  moore.nonblocking_assign %1, 2 if not(%HRESETn)
  moore.nonblocking_assign %1, %PWDATA if and(eq(%s_apb_addr, %REG_PADCFG1), %HRESETn)
}

// combine non-blocking assignments to the same variable
moore.procedure {
  %0 = moore.extract_ref %gpio_padcfg, 0
  %1 = moore.extract_ref %gpio_padcfg, 1
  moore.nonblocking_assign %r_iofcfg, mux(not(%HRESETn), 0, %PWDATA) if or(not(%HRESETn), and(eq(%s_apb_addr, %REG_IOFCFG), %HRESETn))
  moore.nonblocking_assign %0, mux(not(%HRESETn), 2, %PWDATA) if or(not(%HRESETn), and(eq(%s_apb_addr, %REG_PADCFG0), %HRESETn))
  moore.nonblocking_assign %1, mux(not(%HRESETn), 2, %PWDATA) if or(not(%HRESETn), and(eq(%s_apb_addr, %REG_PADCFG1), %HRESETn))
}

The final form may be easier to analyze and map to a moore.register op. What do you think?

Jul 09 '24 17:07 fabianschuiki

Hey, @fabianschuiki! It looks so~ complex! But this is a great and reasonable solution. Let me think:

Because we use scf.while to represent loop statements, we need to unroll scf.while;
We need to extend nonblocking_assign, making the condition its operand, right? Or making src carry the condition?
We need to analyze the condition in scf.if, and then merge them as the operand of nonblocking_assign;
I remember you mentioned we can introduce moore.register like moore.register %clk, %input, en, %enable. If we make the condition the operand of nonblocking_assign, how to deal with it? If we make src carry the condition, I think it's a little unreasonable.

And I think we need to extend moore.register, like %name = moore.register clk, %clk, %input, (rst, (sync | async) %rst, %rstValue)?, (en, %enable)? Or disassemble it into multiple ops, like rst_register, en_register. What do you think?

Jul 10 '24 04:07 terapines-osc-circt

Oh! Make nonblocking_assign like llhd.drv.

Jul 10 '24 10:07 terapines-osc-circt

Yes you're right, we'd have to give moore.register additional operands for asynchronous resets. The synchronous resets we could model with just a multiplexer in front of the register, since the reset there doesn't require any special consideration :thinking:.

I was hoping that making the condition an operand of nonblocking_assign, and by canonicalizing the procedure in a way such that there is only a single nonblocking_assign for each variable, it might become easier to create the register: we could look at a single nonblocking_assign, and use its condition and multiplexer at the data input to determine what the enable, reset, and data inputs would be.

There's probably also an alternative approach where we leave the IR as it is, and perform a more complicated analysis pass over the IR: we would have to track under which conditions each variable is assigned, together with the sensitivity list of the process, and then use that information to create an equivalent moore.register for as many of the variables as possible.

Jul 14 '24 17:07 fabianschuiki

circt circt copied to clipboard

[MooreToCore] Lower moore.variable into hw.wire.

circt
circt copied to clipboard