openCologne icon indicating copy to clipboard operation
openCologne copied to clipboard

Evaluation of Pros and Cons of GateMate 4-input Multiplexer over traditional 4-Input LUTs

Open PythonLinks opened this issue 1 year ago • 0 comments

The goal is to demonstrate the advantages of the CPE 4-Multiplexers over the competitor;s 4-Input LUTs. For this we need an application with a large multiplexer. Fortunately, RISC-V provides a very well known example. This figure compares the two approaches.

Image

With the GateMate, We can see 32 bits go to 8 bits to to 2 bits to 1 bit using 11 CPEs.

With the 4-LUTs, each 4-LUT can only select between 2 bits, so 32 bits go to 16 go to 8 to 4 to 2 to 1, requiring 16+8+4+2+1 = 31 4-LUTs. For the 32 x 32 bit registers in a RISC-V, that would be 996 4-LUTs. For an ALU operation with two arguments, that would require 1988 4-LUTs. Instead the small RISC-V cores store registers in memory, slowing performance. Meaning GateMate would be faster.

In regard to 4 input multiplexers, the GateMate performs just as well as those larger (more expensive) FPGAs with 6 input LUTs.

There are several RISC-V soft cores which could be used to demonstrate this advantage.

Checking "Awesome Gatemate" There are three RISC-Vs on Gatemate. When I started looking into these issues, on Dec 28th, 2024, one of the two FemtoRV authors said that "Without pipelines given the memory model, two clock cycles per instruction is possible when using quark-bicycle." Looking closer, it has a full register set.

reg [31:0] registerFile [31:0];

Femto RV Bicycle

So that would be a good comparison.

The second RISC-V on GateMate is the NEORV32. From the documentation

I think that such a demo would make a great sales pitch for the advantages of the Cologne Chips GateMate over similarly priced FPGAs.

The data register file contains the general purpose architecture registers x0 to x31. For the rv32e ISA only the lower 16 registers are implemented.

A web search says that.

"RV32E is a reduced version of RV32I for embedded systems, with 16 integer registers and soft-float calling convention. It uses the same instruction-set encoding as RV32I and can be combined with standard extensions."

Here are the notes on porting NEORV32 to gatemate.

https://github.com/stnolting/neorv32/discussions/983

and

https://github.com/stnolting/neorv32-setups/tree/main/cologne_chip/GateMateA1-EVB

Sadly no resource utilizations are published. If it is indeed the i version, and not the smaller e version, that would be another good comparison.

There is also the EduBoss but sadly that page does not quote the resources for GateMate, and all the quoted examples store the register file in SSRAM. But there is the option to store the register file in LUTRAM, so that might make a good 3rd comparison. Really all 3 comparisons should be done.

Better yet do a 64 bit RISC-V. That would require almost 4000 4-LUTs.

Which is the best demo? Well whichever one someone first volunteers to do. Based on what little I know, I would recommend the Femtorv Bicycle. FemoRV is quite well known. It has made it onto my list of RISC-V soft cores. https://github.com/PythonLinks/awesome-risc-v-soft-cores

There is a great video describing how one of the FemtoRV soft cores works. https://www.youtube.com/watch?v=8boamDdvD8s&t=366s

I took a shot at firing up the Bicycle on ICE40 (Which I know well). Here is the bug report. https://github.com/BrunoLevy/learn-fpga/issues/125

PythonLinks avatar Feb 27 '25 17:02 PythonLinks