ASIC implementation for verilog-ethernet
Hi,
To my understanding, this repo is now currently only for FPGA implementation, so I wonder if your team plans to target any ASIC implementation in the near future? Or do you know of any finished/ongoing projects that are about porting the verilog-ethernet modules to ASIC?
Thank you!
Well, I don't make ASICs right now, so I can't really make any attempt to optimize this for ASIC implementation. And there is no team, just me. I don't know of any projects that might be using any of this code on ASICs (or really anything at all for that matter...in general I never hear from anyone who is using my code so I cannot point you to any projects that use any of this, ASIC for FPGA).
Got it!
Also, in terms of running this Ethernet core on ASIC, what modification do you think we need to make? I already know there are some reg initialization and initial blocks in the modules that are for FPGA exclusively. Can you think of any other issues that we also have to take care on our own?
Thanks!
Probably all of the inferred RAMs will have to be replaced with explicit module instantiations. Other then that, not sure, I haven't been involved in a tape-out before.
Hi @alexforencich, we've been plugging away at this. We've gone all the way to tape-in (everything except sending it out) with the goal of taping out sometime this year!
The three major areas we've needed to modify are:
- Inferred memories replaced with explicit instantiations which cannot be inferred in ASIC processes
- DDR PHY generation in the oddr by using 2x clock to generate data and sending with downsampled 1x clock, as more common in ASIC processes (it's expensive to get a locked 90 degree clock without a PLL)
- Replacing some initialization statements (reg foo = 1'b1) with explicit reset (always_ff @(posedge clk) if (reset) foo <= 1'b1;) which are nonsynthesizable in ASIC processes
We would like to contribute these enhancements back so that others can benefit. Can you advise how best to integrate the changes? Our suggestions would be:
- Contribute a mem_1r1w_sync.v and replace in certain modules (large fifos come to mind)
- Add a parameter for using 2x clock instead of 90 degree clock. This could replace the nonsynthesizable "GENERIC" PHY https://github.com/alexforencich/verilog-ethernet/blob/master/rtl/oddr.v#L120 but could also be a separate parameterization for "ASIC"
- add an `ifdef EXPLICIT_RESET which surrounds the explicit reset statements so that they are only used on ASIC and do not waste area on FPGA implementations.
Thanks for the advice!
Adding an explicit memory module is something I have considered, but making that change is going to touch a stupid number of files to replace all of the inferred RAMs. And then the files would have to live somewhere, not sure the best spot for that. As well as naming considerations to avoid collisions with other stuff. But it's worth considering if there is a clean way to do it as it could potentially fix some RAM inference issues with finicky tools.
I'm not sure the best solution for RGMII, to be honest. A 90 degree clock is certainly not the only way to do things. There needs to be a 90 degree offset between the clock and the data for both the RX and the TX data somewhere. It can come from the connected PHY, from the board trace, or at the MAC end. I think in the original spec it is supposed to be implemented in the board trace, but these days it's common for PHY chips to have an option to turn that delay on or off, in many cases for both RX and TX. On FPGAs it's simple enough to use DDR flip flops, generate 90 degree clocks, and use IODELAY blocks, but I can see how this could be difficult on a custom IC if you don't have library components for that. I suppose on the TX side using a 2x speedup is reasonable, but the PHY only provides a full-rate clock on the RX side so I'm not sure how that should be handled. Also, does a 2x clock even provide the necessary 90 degree phase shift? Seems like you might need a 4x clock to do that, perhaps.
As for initial states...any important initial state should already have an explicit reset. If I have missed something important, then those resets should just be added in. But in most cases the initial state is actually not important (although in some cases the simulation won't work correctly if a signal starts off as X, but it works fine with any non-X or Z value). I know some people are of the RESET-ALL-THE-THINGS opinion, but I think that's a slightly different issue.