clash-compiler Add Xilinx dual clock FIFO primitive to `clash-cores`

This adds Xilinx's DcFifo to clash-cores. The Haskell model and the IP are both tested.

Past discussion is here; it is bit cluttered.

It's a draft PR because it depends on https://github.com/clash-lang/clash-compiler/pull/2257

Still TODO:

[ ] ~Write a changelog entry (see changelog/README.md)~ (clash-cores isn't on Hackage)
[x] Check copyright notices are up to date in edited files
[ ] Fix full behavior and add a test for it
[ ] Fix overflow behavior and add a test for it
[x] Remove unused / obsolete code
[x] Test reset+full behavior

Jul 05 '22 14:07 vmchale

@DigitalBrains1 and I just talked a little about the tests that are currently there. Peter noticed that we only test the happy path; that is, when both the write side and the read side behave properly with regards to the full/empty signals:

Not writing when full is asserted
Not reading when empty is asserted

We should add two basic tests:

What happens when you write when full is asserted. Both the Haskell model and Xilinx model should drop the incoming write, but otherwise continue functioning correctly.
What happens if you read when empty. This is mostly relevant for the Haskell model: it should continue to work when this happens. I.e., the simulation shouldn't crash completely.

Jul 08 '22 14:07 martijnbastiaan

Oh fair! I will get on that.

Jul 08 '22 14:07 vmchale

Ok just pushed those two. It now tests that ordering remains upon overflow, and it tests that the simulation completes when it underflows, that is, the DcFifo doesn't depend on undefined values to progress!

Jul 08 '22 17:07 vmchale

What I have now is inelegant - it has multiple modules since I can't simulate multiple top-level entities in vivado. This is because of https://github.com/clash-lang/clash-compiler/issues/2264 (which I would like to put off for the moment).

I still need to see what happens in RTL when the driver circuit doesn't respect the full signal.

Jul 11 '22 12:07 vmchale

This is because of https://github.com/clash-lang/clash-compiler/issues/2264 (which I would like to put off for the moment).

Seems sensible

Jul 11 '22 12:07 martijnbastiaan

This depends on my hack here, but now has the thorough tests (RTL)

Jul 11 '22 14:07 vmchale

Ok, the RTL tests:

Test of different (and same) clock domains, read>write and write>read.
Test that nothing goes "backwards" even if we write when the FIFO is full. So we don't always get the expected data out, but we test that it's in the right order.

Jul 11 '22 14:07 vmchale

What I have now is inelegant - it has multiple modules since I can't simulate multiple top-level entities in vivado. This is because of https://github.com/clash-lang/clash-compiler/issues/2264

That's not the only reason. If you want to have multiple top-level entities simulated from the same clash (gen) then those will need separate temporary directories and obviously also separate test names. I was planning on adding that anyway since I want to use it for Xilinx Floating in clash-cores.

Our SymbiYosys and Vivado CI environments both only support a single element in the list BuildSpecific [String], the other tools all correctly handle the case of a list with multiple elements. I wasn't planning on fixing SymbiYosys until we actually need it, there's more important things to do.

Note: I did not understand #2264 at a glance, so I might be missing something you think is obvious ;-)

Jul 14 '22 10:07 DigitalBrains1

@DigitalBrains1 sorry about the line-length :grimacing:, I've got a script to pinpoint things now

Jul 25 '22 15:07 vmchale

I see I missed quite a few things in my review, sorry. I'll add them as TODOs to the PR description. Thanks @DigitalBrains1 for being vigilant, I should have been more careful.

Jul 26 '22 14:07 martijnbastiaan

Thank you! I've cherry-picked that commit.

Jul 27 '22 12:07 vmchale

I suddenly notice we declare the VHDL component as follows:

    component topEntity_dcfifo_C5BC86E64AAA3F54 port                            
      ( wr_clk : in std_logic                                                   
      ; wr_rst : in std_logic                                                   
      ; rd_clk : in std_logic                                                   
      ; rd_rst : in std_logic                                                   
      ; din : in std_logic_vector                                               
      ; wr_en : in std_logic                                                    
      ; rd_en : in std_logic                                                    
      ; dout : out std_logic_vector                                             
      ; full : out std_logic                                                    
      ; empty : out std_logic                                                   
      ; rd_data_count : out std_logic_vector                                    
      ; wr_data_count : out std_logic_vector );                                 
    end component;

while the Vivado instantiation template says:

COMPONENT fifo_generator_0
  PORT (
    wr_clk : IN STD_LOGIC;
    wr_rst : IN STD_LOGIC;
    rd_clk : IN STD_LOGIC;
    rd_rst : IN STD_LOGIC;
    din : IN STD_LOGIC_VECTOR(17 DOWNTO 0);
    wr_en : IN STD_LOGIC;
    rd_en : IN STD_LOGIC;
    dout : OUT STD_LOGIC_VECTOR(17 DOWNTO 0);
    full : OUT STD_LOGIC;
    empty : OUT STD_LOGIC;
    rd_data_count : OUT STD_LOGIC_VECTOR(9 DOWNTO 0);
    wr_data_count : OUT STD_LOGIC_VECTOR(9 DOWNTO 0) 
  );
END COMPONENT;

Note how they constrain the size of the vectors. According to Rinse, this is no problem, and I can actually synthesise the design, but I suspect it was not deliberate; we generally constrain our vector sizes, right? The problem is not with the invocation of compInBlock as it is getting the sizes:

  -- -1 for maybe
  let dataTy = N.BitVector (DSL.tySize (DSL.ety wDataM) - 1)
  let
    compInps =
      [ ("wr_clk", N.Bit)
      , ("wr_rst", N.Bit)
      , ("rd_clk", N.Bit)
      , ("rd_rst", N.Bit)
      , ("din", dataTy)
      , ("wr_en", N.Bit)
      , ("rd_en", N.Bit)
      ]
    compOuts = catMaybes
      [ Just ("dout", dataTy)
      , Just ("full", N.Bit)
      , Just ("empty", N.Bit)
      , dcReadDataCount `orNothing` ("rd_data_count", N.BitVector depth)
      , dcWriteDataCount `orNothing` ("wr_data_count", N.BitVector depth)
      , dcUnderflow `orNothing` ("underflow", N.Bit)
      , dcOverflow `orNothing` ("overflow", N.Bit)
      ]

-- [...]

    DSL.compInBlock dcFifoName compInps compOuts

PS: this component is configured differently than currently in the PR; but the PR will catch up with it later.

Aug 20 '22 05:08 DigitalBrains1

Fix for that seems trivial:

--- a/clash-lib/src/Clash/Backend/VHDL.hs
+++ b/clash-lib/src/Clash/Backend/VHDL.hs
@@ -1504,7 +1504,7 @@ inst_ (CompDecl nm ps0) =
   fmap Just $ "component" <+> pretty nm <+>
     ("port" <> line <> indent 2 (tupledSemi ps <> semi))
     <> line <> "end component" <> semi
-  where ps = traverse (\(t,pd,ty) -> pretty t <+> ":" <+> ppd pd <+> qualTyName ty) ps0
+  where ps = traverse (\(t,pd,ty) -> pretty t <+> ":" <+> ppd pd <+> sizedQualTyName ty) ps0
         ppd = \case { In -> "in"; Out -> "out"}
 
 inst_ (CondAssignment id_ _ scrut _ [(Just (BoolLit b), l),(_,r)]) = fmap Just $

Aug 22 '22 11:08 DigitalBrains1

@DigitalBrains1 yes! I came to the same conclusion, I have a PR just now.

Aug 22 '22 11:08 vmchale

I pushed three commits:

I refactored the code such that we can again use $(deriveTermToData ''DcConfig). Martijn and I thought this was the cleaner solution. It will really pay off if we later extend the configuration to more options which are encoded as an ADT to allow the user to only specify meaningful combinations of configurations.
I made the way we specify IpConfig properties more pleasant to read and write. In the next commit, I add a lot of properties. I defined TclShow instances for all types I expect to occur in properties. If you realise I missed a likely candidate, please say so.
The big one: we now use synchronous resets. The documentation (product guide) on asynchronous resets is just too ambiguous and inconsistent. If you take a favourable reading, picking the most sensible option for every oddness, then you get a nicely working system with asynchronous resets. If you take a conservative approach and say "I'd better do it like this because that way it has to go right whatever they meant by that odd bit in the product guide", you get such a complex beast that it is easier to do the synchronisation yourself. So, let's do that. We have synchronous resets, and the section on synchronous resets in the product guide is pretty clear. A proper reset sequence that respects the conditions in the product guide can easily be achieved. The specific proper reset sequence to apply in practice is left to the user (who should read the product guide carefully anyway). But in a later commit, I'll add documentation to the stuff.

I discussed proper testing with Christiaan, and we've decided just not to do any reset at all in the test bench (note that resetting is optional for this FIFO configuration anyway; we need the functionality to flush the FIFO, but the FIFO doesn't care). I will create an issue after this PR is merged noting that we still need to test resets in the test bench. I think a well-designed test bench for resets is not trivial, and I want to get this stuff merged without the engineering effort involved. I trust we do it correctly anyway, I don't expect the test bench to hit any issues (*knocks on wood*).

The FIFO will only work with domains that have a synchronous reset, but the constraint for that at the type level is too painful to carry around. Rather, both simulation and generating HDL will call error.

I removed HLINT ignore camelCase because it isn't relevant anymore.

I reordered the imports not just to conform with our latest style guide, but also because I could not determine the principle behind the existing ordering and grouping, so I had no clue where to add imports ;-). Vanessa, if you want to create subgroups within the major groups defined by the style guide, you'll need to redo that.

Note that I reformatted this bit: BlackBoxes.hs:

-- see lines 51-56, 102-107, 202-205 where these args are used.
[2,7,8,9,10,11,12]

The problem with this kind of comment is that it is incredibly high maintenance and in fact it is likely to be overlooked when changing the file, making the line numbers no longer match up. Showcasing this: the line numbers were already out of date and incorrect!

Aug 23 '22 15:08 DigitalBrains1

Great, thanks! I don't have any objections, just questions, I'll have a review shortly.

Aug 24 '22 17:08 vmchale

With this comment:

Screenshot 2022-09-12 at 12-13-43 Add Xilinx dual clock FIFO primitive to clash-cores by vmchale · Pull Request #2270 · clash-lang_clash-compiler

I replied to a comment that asked whether we should make a CHANGELOG entry for this.

Sep 12 '22 10:09 martijnbastiaan

@martijnbastiaan yes! I added the changelog entry that mentioned records but I was confused by @DigitalBrains1's comment:

we can now process records and type-level naturals

I don't see how the new code allows us to process type-level naturals.

Sep 12 '22 10:09 vmchale

I don't see how the new code allows us to process type-level naturals.

It was loose phrasing to refer to the fact we can now do termToData on data with an SNat at least when you define it as

data DcConfig depth = DcConfig { dcDepth :: !depth, ... }

and then supply an SNat n as depth, thanks to the addition of TermLiteralSNat.

I suppose your confusion is that you thought I referred to the problematic formulation of:

data DcConfig' depth = DcConfig' { dcDepth' :: !(SNat depth), ... }

which unfortunately doesn't work?

Sep 12 '22 11:09 DigitalBrains1

Oh yes I see, thank you! I have just pushed the full changelog.

Sep 12 '22 11:09 vmchale

@vmchale The current commit message isn't valuable for future code archeology. A good commit body should mention:

What change was introduced
Why the change was introduced
Explain shortcomings / future work statements (although more often than not this is what inline -- TODO comments are for)

These aren't strict rules - sometimes reasons are completely obvious (like adding support for a Xilinx IP). Sometimes the change is fully explained in the code diff itself (do not assume code is self-explanatory though!). Sometimes there's no future work the author can think of.

Finally, in the footer it should refer to:

Any issues (if applicable)
Any co-authors (if applicable)

Oct 22 '22 07:10 martijnbastiaan

Ah I'm so sorry I didn't realize the Co-authored-by lines had to be at the end of the commit 😬

Oct 24 '22 10:10 vmchale

The behavior of full/empty signals in the Haskell model is not tested.

What do you mean? It's in the test bench, right? I have to admit those are only run automatically in Vivado, which is a shortcoming we should one day fix, but I did run them manually in Haskell as well. Do you mean the test bench doesn't check cross-domain timing? That reminds me. The Haddock mentions

Note that the behavior of the FIFO in Haskell simulation does not correspond exactly to the behavior in HDL simulation, and that neither behaviors correspond exactly to hardware. All the different behaviors are functionally correct and equivalent when the usage guidelines in the product guide are observed.

and the original PR cover letter said

We don't care too much about having the Haskell model correspond 1:1 to the RTL for our client's project. Please de-prioritize.

I think a message to this effect should be in the commit message.

Oct 24 '22 11:10 DigitalBrains1

clash-compiler clash-compiler copied to clipboard

Add Xilinx dual clock FIFO primitive to `clash-cores`

Still TODO:

clash-compiler
clash-compiler copied to clipboard