calyx icon indicating copy to clipboard operation
calyx copied to clipboard

`tdcc`: Incorrect FSM generated in presence of `par`

Open rachitnigam opened this issue 4 years ago • 4 comments

The following program incorrectly computes the final value in y_int0 to be 0. The expected value is 1:

import "primitives/core.futil";
component main(@go go: 1, @clk clk: 1, @reset reset: 1) -> (@done done: 1) {
  cells {
    f0 = std_reg(4);
    @external y_int0 = std_mem_d1(1, 1, 1);
    comb_reg = std_reg(1);
  }
  wires {
    group zero_f0 {
      f0.write_en = 1'd1;
      f0.in = 4'd0;
      zero_f0[done] = f0.done;
    }
    group false<"static"=1> {
      comb_reg.in = 1'd0;
      comb_reg.write_en = 1'd1;
      false[done] = comb_reg.done ? 1'd1;
    }
    group true<"static"=1> {
      comb_reg.in = 1'd1;
      comb_reg.write_en = 1'd1;
      true[done] = comb_reg.done ? 1'd1;
    }
    group write_comb_reg {
      y_int0.addr0 = 1'd0;
      y_int0.write_en = 1'd1;
      y_int0.write_data = comb_reg.out;
      write_comb_reg[done] = y_int0.done;
    }
  }

  control {
    seq {
      false;
      par {
        true;
        zero_f0;
      }
      write_comb_reg;
    }
  }
}

The false group sets the value in comb_reg to 0 while true sets it to 1. At the end of the program, we expect the value in comb_reg to be 1 since true will execute on all program paths.

However, tdcc generates the following code for the FSM and the par compilation:

// Group that implements execution of the `par` block
    group par {
      true[go] = !(pd.out | true[done]) ? 1'd1;
      pd.in = true[done] ? 1'd1;
     ...
      par[done] = pd.out & ... ? 1'd1;
    }

Group generated for top-level FSM:

    group tdcc {
      par[go] = false[done] & fsm.out == 2'd0 ? 1'd1;
      par[go] = !par[done] & fsm.out == 2'd1 ? 1'd1;
      ...
    }

The first par[go] statement represents the early transition.

The problem is that both true[done] and false[done] point to comb_reg.done. This creates the following cycle-by-cycle behavior:

  1. cycle 0: false starts executing
  2. cycle f: false is done executing and par starts executing. However, the guard in true[go] = !(pd.out | true[done]) is false because true[done] == false[done]. This also means that pd.in = true[done] ? 1'd1 will execute.
  3. cycle f + 1: true[go] = !(pd.out | true[done]) still doesn't execute because the previous cycle wrote 1 into pd.

This means that true is never executed causing the unexpected output to be generated.

The problem is very similar to a previous bug in tdcc where the done signal from a previous cycle was being used to stop the execution of a group in the current cycle. A possible solution to this problem is exending the guard in the par to be:

true[go] = !(pd.out | (true[done] & fsm.out == 0)) ? 1'd1;

A higher-level statement of this problem is that while registers are allowed to be shared, their done signals cannot be. This class of bus arise from the fact that when we share registers, we invariably end up sharing their done signals as well. The style of fixes to this set of problem is making sure that all the done signals are "indexed" in the right FSM state (i.e. read in the FSM state that they should be read in).

rachitnigam avatar Sep 10 '21 20:09 rachitnigam

Argh, just came across a slightly different version of the above example where the control program is:

    seq {
      false;
      par {
        seq { true; zero_f0; }
        zero_f0;  // Ensures that the `par` block doesn't get compiled away.
      }
      write_comb_reg;
    }

My initial thought was that I could just change the compilation of par so that each child's done whole is only read in the correct FSM state of the surrounding context. However, this example does not work with that change. The problem is that the FSM state from the outer context needs to be threaded through all the way to the true group's execution to ensure that its done signal is only read in the right state.

This means that tdcc needs to change a little more dramatically than I expected to allow for threading the current FSM state and ensuring that done signals are read in the right context.

rachitnigam avatar Sep 13 '21 20:09 rachitnigam

Did this get fixed? If so I'm curious what the correction was

EclecticGriffin avatar Sep 14 '21 19:09 EclecticGriffin

Nope, incorrectly closed. The problems is explained here: https://github.com/cucapra/calyx/discussions/651#discussioncomment-1324881

rachitnigam avatar Sep 14 '21 20:09 rachitnigam

The integration test for durbin is disabled because of this issue: https://github.com/cucapra/calyx-evaluation/commit/115b282c8452a070bf29537b6e0867d8a77573ed

rachitnigam avatar Nov 19 '21 19:11 rachitnigam