nmigen
nmigen copied to clipboard
WIP: Expand and document lib.cdc
Very much a work in progress, but I wanted to open early to get feedback on the approach/direction. I'm porting/rewriting some of the missing modules from migen.genlib.cdc.
Currently in this patch set:
- Add docstrings to the existing modules. I tried to copy the "house style" I saw elsewhere.
- Basic checking to give more meaningful traces when you do something like request a MultiReg of length 0. Again I tried to copy the idioms I saw elsewhere
- Implement PulseSynchronizer
- Implement Gearbox
Known issues:
- Smoke tests only; no formal checks
- Checks are limited by single-clock simulation -- maybe it was a bad idea to play with the CDC library first :)
- I'm passing in the domain names rather than using DomainRenamer. Not sure whether it's best to leave this PR open until this is in place, or merge early so people can start hacking on this. Not my decision :)
- The storage size calculation in this version of Gearbox is different to the one in Migen. IMO the old one was unsafe, but this one may be too conservative. I could also be plain wrong!
- (edit:) The output mux on the Gearbox is driven from two clock domains. This is a legitimate thing to do here, but I saw mention somewhere that doing this should cause an error.
I definitely still intend to port:
- BusSynchronizer
- ElasticBuffer
And we probably ought to do something about GrayCounter at some point, but I think it's less important than the others.
As a general style question, there seem to be two ways of doing module wiring in nmigen.lib
at the moment. One is to pass external signals into __init__
, which gives you more of a Verilog-style instantiation. The other is to create "port" signals in __init__
, which the parent then wires up during elaboration. I've used the second style because it seems a bit cleaner, and doesn't require creating extraneous signals like in Verilog, but not sure if this is the right thing to do?
Codecov Report
Merging #40 into master will increase coverage by
0.64%
. The diff coverage is100.00%
.
@@ Coverage Diff @@
## master #40 +/- ##
==========================================
+ Coverage 80.70% 81.34% +0.64%
==========================================
Files 32 32
Lines 4976 5158 +182
Branches 1071 1105 +34
==========================================
+ Hits 4016 4196 +180
- Misses 825 829 +4
+ Partials 135 133 -2
Impacted Files | Coverage Δ | |
---|---|---|
nmigen/lib/fifo.py | 98.76% <ø> (ø) |
|
nmigen/lib/cdc.py | 100.00% <100.00%> (+13.33%) |
:arrow_up: |
nmigen/compat/fhdl/structure.py | 63.01% <0.00%> (-2.21%) |
:arrow_down: |
nmigen/hdl/dsl.py | 99.28% <0.00%> (-0.01%) |
:arrow_down: |
nmigen/hdl/xfrm.py | 96.21% <0.00%> (+0.01%) |
:arrow_up: |
nmigen/back/pysim.py | 96.64% <0.00%> (+0.02%) |
:arrow_up: |
nmigen/build/plat.py | 23.60% <0.00%> (+0.03%) |
:arrow_up: |
nmigen/back/rtlil.py | 86.26% <0.00%> (+0.05%) |
:arrow_up: |
nmigen/hdl/ast.py | 86.76% <0.00%> (+0.05%) |
:arrow_up: |
nmigen/hdl/ir.py | 94.33% <0.00%> (+0.07%) |
:arrow_up: |
... and 3 more |
Continue to review full report at Codecov.
Legend - Click here to learn more
Δ = absolute <relative> (impact)
,ø = not affected
,? = missing data
Powered by Codecov. Last update ad1a40c...2e6cf3c. Read the comment docs.
Last commit is just to fix a small coverage blackspot in the existing code. I was having far too much fun with the graphs on Codecov
BusSynchronizer is as per Migen except:
- number of synchronisation stages is optionally parameterised
- timeout retry is optional by setting timeout to 0 (since it is a bit risky, can cause metastabilities in capturing domain)
- Timer is not factored out into a module. nMigen does not have a
misc
library right now
Again it needs formal checks, and tests are devalued by running them in one clock domain.
Edit: updated the commit. Previously there was a clock-enabled odomain register driven by a clock-enabled idomain register. This is ok for CDC purposes, because the CDC-safe handshake controls the clock enables, but I've also put in a (shortened) MultiReg
between these registers, because "all cross-domain signals go through a MultiReg
" seems like a useful property.
Edit: squashed in another small change: making sure that sync_stages gets passed on to the MultiReg for the width=1 special case.
Add ElasticBuffer
. Faithful reproduction of the Migen version, except without the reset synchronisers (since these would normally be per-reset-domain, and there is still some code motion wrt reset/clock domains in nMigen as a whole).
Edit: also this elastic buffer is unusual, usually there is some provision for slipping the pointers to account for long-term unbounded phase change. This would be more light-weight than an AsyncFIFO
but only works for a few PPM difference. I'm implementing as-is because something like that would have a different interface to current ElasticBuffer
.
Force push due to finding small issues in earlier commits, e.g. gearbox increment didn't work for non-pow-2 chunk counts (which can't be tested in one clock domain, I just spotted it while reading the code). I've tried to keep it so that each commit can be separately checked out, tested and reviewed. If you'd rather I just push fixes, let me know :)
Also, I've copied the _incr
used in lib.fifo
. Might be worth factoring this out, but that's outside of this PR.
- I'm passing in the domain names rather than using DomainRenamer. Not sure whether it's best to leave this PR open until this is in place, or merge early so people can start hacking on this. Not my decision :)
I'll try to fix DomainRenamer to work on Modules as soon as possible!
As a general style question, there seem to be two ways of doing module wiring in
nmigen.lib
at the moment. One is to pass external signals into__init__
, which gives you more of a Verilog-style instantiation. The other is to create "port" signals in__init__
, which the parent then wires up during elaboration. I've used the second style because it seems a bit cleaner, and doesn't require creating extraneous signals like in Verilog, but not sure if this is the right thing to do?
I use the second style, with some exceptions like MultiReg. The reason is that the signals aren't always completely independent, for example you might want to pass a width and have a set of signals with that width, double that width, etc created for you. Also, constructors with a large number of arguments are unwieldy and bug-prone, especially if you change the constructor later.
I'll try to fix DomainRenamer to work on Modules as soon as possible!
Ok, no sweat, I'll rebase + fix as and when :)
Rebased on latest master and fixed up docstrings:
- Reflowed to 100 characters
- Fixed ReST formatting
- Americanized spellings to be consistent with class names
- Some edits for clarity
I also stared at the code for a while, only one minor fix which was something that really oughtn't be retimed, but didn't have the no_retiming
attribute. Not saying there's nothing to fix, just that it's reached the point where squinting at my own code is nonproductive.
Domains are passed as strings, will fix this after the xfrm.py
changes go in.
@Wren6991 Done!
Thanks :) I've fixed up PulseSynchronizer and Gearbox so far. Unfortunately I'm away on a trip at the moment and don't have all the necessary tools set up on my work laptop, so it'll take me a while to get the fixes tested and pushed.
For consistency I'm using "read" and "write" for default domain names (matching AsyncFIFO), but there is a not-totally-frivolous argument to use something like "i" and "o" instead:
- Matches the signal naming convention used in e.g.
lib/coding.py
. Makes sense fori
to be in"i"
,o
in"o"
etc - Having domains be the same number of characters is super nice. I have bad habits from RTL of lining up blocks of assignments in my code, and it's nice if this happens without inserting whitespace
-
DomainRenamer({"i": "gpu", "o": "display"})(MyCDCModule())
is just a bit pithier
Was just something that occurred to me when editing the files, it's not up to me.
Thanks :) I've fixed up PulseSynchronizer and Gearbox so far. Unfortunately I'm away on a trip at the moment and don't have all the necessary tools set up on my work laptop, so it'll take me a while to get the fixes tested and pushed.
For consistency I'm using "read" and "write" for default domain names (matching AsyncFIFO), but there is a not-totally-frivolous argument to use something like "i" and "o" instead:
Now that I think more about it, I think it doesn't make sense to use DomainRenamer
for synchronization primitives. The reason being, for these primitives, you always want to rename them, so requiring each caller to do that work is clearly superfluous, and it results in unnecessarily hard to read code too.
There's a good argument that PulseSynchronizer
should work the same as AsyncFIFO
. I agree. I think AsyncFIFO
should use explicit domain specification as well! Pretty much every instance of AsyncFIFO
is immediately followed by a DomainRenamer
call.
WDYT?
So a flow could be:
- If you are writing a block of synchronous logic, you use
sync
, and then applyDomainRenamer
to yeet it into the correct domain - Knowledge of domains is confined to a specific region of your code, which places synchronous blocks into the correct domains and also glues them together with CDC modules
- Since CDC +
DomainRenamer
is actually a single constructor in a trenchcoat, skip the extra step and pass domains in directly
If so, that feels cleaner, yeah!
Yep!
to yeet it into the correct domain
Incredible.
Awesome. I guess the next step for this PR is a lot more testing then. I did note #28, which I guess I could start to take a look at at some point, as this bumps against it. It would be good to learn about that part. Formal would be good too.
For CDC it would also be cool to do some soak testing on real FPGAs, to catch genuine CDC bugs rather than just state machine bugs (albeit they're difficult to debug once found). Not sure if this is something you already had ideas on.
For CDC it would also be cool to do some soak testing on real FPGAs
The CDC primitives in oMigen are extremely well tested. I think we mainly need to make sure the nMigen ones don't stray from them, like it happened with AsyncFIFO...
Ok that's a good point. Will read through the oMigen library again later today. For the most part it's pretty faithful but mistakes creep in!
There are things in the old library that seemed odd though, e.g. for Gearbox with lcm(iwidth, owidth) == iwidth
, the write port will have depth 2. The pointers start 180° apart, but there is no guard period between final o read on an ichunk and the next write to that ichunk, or conversely, first read from an ichunk and the preceding write. You're at the mercy of dice roll on the two reset synchronisers. I'm currently using a different storage size calculation because of this (it can end up larger or smaller than oMigen depending on ratio).
I'll definitely make sure the logic is all ok, and then maybe raise questions about things like the above.
Sounds good!
I did a side-by-side of oMigen vs current state of my code
-
PulseSynchronizer
- Actual logic identical
- Number of sync stages in
MultiReg
is parameterised. - Currently I don't have the
reset_less
attribute on the non-CDC flops in this module, but the MultiReg defaults toreset_less=True
. It would probably be better to parameterise this, and either have everything or nothingreset_less
-
BusSynchronizer
- Handshake logic is identical: req/ack with two PulseSynchronizers
- Number of sync stages parameterised
- Option to disable timeout, as it has small chance of temporarily breaking the handshake for oclock << iclock (or oclock sometimes gated); multiple
req
s can be in flight! - I removed one layer of flops from data CDC path. See diagram
- Again some confusion over what should/should not be
reset_less
. Currently all synchronous logic is resettable but theMultiReg
s are not - Found an issue with my code:
ibuffer
clock enable was driven by only_pong.o
(ack, resynced to idomain) in oMigen, but mine was more permissive, and also enabled on timer restarts and start-of-day flag. Think this is ok, but I've fixed it to be exactly like oMigen.
-
ElasticBuffer
- Logic is faithful reproduction of oMigen
- oMigen code also has some reset structure inside. It ORs together the reset from each domain, and resynchronises this new reset internally. I think this is problematic because reset assertion is asynchronous, so e.g. an input-domain reset can inject metastabilities into the output domain. This is better handled by proper reset sequencing at system level. WDYT?
-
GearBox
- Same comment on resets as
ElasticBuffer
- Actual logic is faithful to oMigen
- Storage size calculation is different in my version: oMigen takes LCM of two widths, and doubles it unconditionally. This is unnecessarily large for many coprime widths (e.g. 5:4 for video) and for
lcm(i, o) in (i, o)
it seems too small to me. My code starts withlcm(i, o)
and doubles it until the depth of each port is at least four, logic being that this gives you one chunk of no-man's-land on each side between the chunk you are reading/writing and the chunk the other side is writing/reading. WDYT?
- Same comment on resets as
BusSynchronizer datapath
I drew a rough box diagram of oMigen BusSynchronizer
:
My problem with this is that there is a 3-flop delay for both the request and the data. (1 flop in idomain
, two in odomain
). This means there is a race between the obuffer
clock enable and the data. I.e. due to bit skew observed at inputs to the req and data MultiReg
s, it seems possible to have req = 1 at the output but garbled data.
I've fixed the race by removing one stage from the datapath MultiReg
. Note there are still two non-retimable layers of flops in odomain
on the datapath, and also, sync_stages
can be increased to stretch out the entire BusSynchronizer
.
@Wren6991 Sorry, I completely missed your update, GitHub decided to not send me an email for it for some reason.
Do you agree on my diagram of oMigen BusSynchronizer? I might have misunderstood e.g. the PulseSynchronizer logic. Haven't looked at it for a while now.
I'll need to think more about it as I wasn't the one who wrote it. @sbourdeauducq ?
I've fixed the race by removing one stage from the datapath
MultiReg
.
But then, how do you preserve the MultiReg constraints between the anti-metastabilty registers?
Sounds easier to add one register of delay into the request path.
@sbourdeauducq You're 100% right -- it's possible to just falsepath/MCP the connection between write data buffer and the MultiReg, but it's cleaner if it can just rely on constraints on paths internal to the MultiReg.
Added a patch which restores the data path to its original length, and lengthens the request path to compensate.
Fixed merge conflict and made classes derive from Elaboratable (sorry for force-push)
Thanks, I'll take another look soon.
Updated naming conventions according to #97. I've also updated the only use of lib.cdc
elsewhere in the standard library (in lib.fifo
) to reflect this. Sorry for being unresponsive, day job has been busy
That's fine, I have a lot of other work here, so you're not really blocking anything.
I saw this in #113
CDC improvements like CDC safety (#4) and automatic false path constraints (#87; depends on #88). These are definitely needed, but there are still plenty of designs that will be just fine without them, and I think for most people waiting to pick up nMigen they wouldn't be an issue. Again, these are more of 0.2 issues.
And have also noted #87. Wanted to mention that falsepath is not necessarily safe for all of these primitives: in particular, BusSynchronizer is unsafe if there is too much skew between the request path and the datapath, and false paths can accumulate huge routing delays when FPGA utilisation is high.
Since we want the widest possible toolchain support, one option is a maxdelay constraint of capture period minus setup. This isn't perfectly safe (clock skew), and does make closure harder, but might be worth considering.
Wanted to mention that falsepath is not necessarily safe for all of these primitives
That seems very troublesome, and certainly something we want to loudly warn about. Would you be able to prepare a detailed evaluation of all of our CDC primitives with this in mind?
BusSynchronizer is unsafe if there is too much skew between the request path and the datapath, and false paths can accumulate huge routing delays when FPGA utilisation is high.
Are there any ways to work around this?
Since we want the widest possible toolchain support, one option is a maxdelay constraint of capture period minus setup. This isn't perfectly safe (clock skew), and does make closure harder, but might be worth considering.
This seems like a dramatic increase in complexity of platform code. It's an option, but maybe we can avoid it completely...
Relevant: https://github.com/m-labs/migen/commit/f4979a2403316942909bdfdd77e2a174a2b77857#commitcomment-34195474
That is an interesting handshake, but seems to overlap quite a lot with BusSynchronizer. I have had the occasional shower thought that BusSynchronizer should have an optional valid/ready handshake in the launching domain (and perhaps a valid pulse in the capturing domain), which would give you something similar to this BlindTransfer.
I'm also happy to just port BlindTransfer wholesale!