amaranth icon indicating copy to clipboard operation
amaranth copied to clipboard

Strange simulator behavior: clock signal pausing while sync blocks are speeding up

Open hansfbaier opened this issue 4 years ago • 10 comments

I have the following simulator configuration:

receiver = ADATReceiver()
clk_freq = 120e6
# 24 bit plus the 6 nibble separator bits for eight channel
# then 1 separator, 10 sync bits (zero), 1 separator and 4 user bits
adat_freq = 48000 * ((24 + 6) * 8 + 1 + 10 + 1 + 4)
clockratio = clk_freq / adat_freq
sim = Simulator(receiver)
sim.add_clock(1.0/clk_freq, domain="sync")
sim.add_clock(1.0/adat_freq, domain="adat")
print(f"clock ratio: {clockratio}")

cycles = 10

def sync_process():
    for _ in range(int(clockratio) * cycles):
        yield Tick("sync")

def adat_process():
    testdata = one_empty_adat_frame() + generate_sixteen_frames_with_channel_numbers_in_most_significant_nibble_and_sample_numbers_in_sample()
    for bit in testdata[224:512 * 2]:
        yield receiver.adat_in.eq(bit)
        yield Tick("adat") # if I don't put this here, the adat signal is constant zero

sim.add_sync_process(sync_process, domain="sync")
sim.add_sync_process(adat_process, domain="adat")
with sim.write_vcd('receiver-smoke-test.vcd', traces=[receiver.adat_in, receiver.clk_in, receiver.adat_clk_in]):
    sim.run()

When I simulate this, I have strange parts, where the clock of domain sync seems to pause, but at the same time a counter in that domain seems to speed up:

image

What is going on here? Am I using the simulator in a wrong way, or is this a bug in the simulator?

hansfbaier avatar Dec 03 '20 23:12 hansfbaier

I have absolutely no idea how you managed to get this behavior; I'm assuming a simulator bug.

whitequark avatar Dec 03 '20 23:12 whitequark

Please attach the full code necessary to reproduce this issue.

whitequark avatar Dec 03 '20 23:12 whitequark

@whitequark I just pushed my code. It is my first nontrivial FPGA project, so I am pretty much a newbie here. It is also my first nmigen project. https://github.com/hansfbaier/adat-fpga

hansfbaier avatar Dec 03 '20 23:12 hansfbaier

@whitequark Could you reproduce the behavior with the code I pushed above?

hansfbaier avatar Dec 14 '20 02:12 hansfbaier

Sorry, I have only got around to it now. I get this:

Traceback (most recent call last):
  File "/usr/lib/python3.7/runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "/usr/lib/python3.7/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/home/whitequark/adat-fpga/adat/cli.py", line 62, in <module>
    main()
  File "/home/whitequark/adat-fpga/adat/cli.py", line 29, in main
    args.with_icache, args.icache_nways, args.icache_nlines, args.icache_nwords,
AttributeError: 'Namespace' object has no attribute 'with_icache'

whitequark avatar Dec 14 '20 02:12 whitequark

@whitequark : cli.py is not yet working, I have not come that far yet. While developing, I run the simulator in the main of: ./adat/sources/receiver/receiver.py I can run it from its directory with: /usr/bin/env /usr/bin/python3 receiver.py

Am Mo., 14. Dez. 2020 um 09:39 Uhr schrieb whitequark < [email protected]>:

Sorry, I have only got around to it now. I get this:

Traceback (most recent call last): File "/usr/lib/python3.7/runpy.py", line 193, in _run_module_as_main "main", mod_spec) File "/usr/lib/python3.7/runpy.py", line 85, in _run_code exec(code, run_globals) File "/home/whitequark/adat-fpga/adat/cli.py", line 62, in main() File "/home/whitequark/adat-fpga/adat/cli.py", line 29, in main args.with_icache, args.icache_nways, args.icache_nlines, args.icache_nwords, AttributeError: 'Namespace' object has no attribute 'with_icache'

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/nmigen/nmigen/issues/554#issuecomment-744130083, or unsubscribe https://github.com/notifications/unsubscribe-auth/AABEI72V354A637BAULFMFLSUV3FNANCNFSM4UMVEV6Q .

hansfbaier avatar Dec 14 '20 04:12 hansfbaier

image The behavior seems to happen whenever combinatorial logic operates. Then the clock takes a gasp, but synchronous changes happen anyway, but with "squeezed" timings. Fortunately that did not hinder me completing the receiver. Just pushed all the changes. The design probably can be much simplified, but it is my first non trivial FPGA project, and it just passed the smoke test, wohoo!

hansfbaier avatar Dec 24 '20 04:12 hansfbaier

The design probably can be much simplified, but it is my first non trivial FPGA project, and it just passed the smoke test, wohoo!

Congratulations! I haven't been able to look into this issue yet as I've been quite busy working on CXXRTL, but it's something that I think is very serious, so I'll make sure to address it before the next release.

whitequark avatar Dec 24 '20 04:12 whitequark

As the repository mentioned above is a moving target, here is a snapshot to reproduce the issue:

clk-bug.tar.gz

Steps to reproduce:

  1. cd adat
  2. python3 receiver.py
  3. open the receiver-smoke-test.vcd with gtkwave and look where clk_in drops out

hansfbaier avatar Jan 19 '21 01:01 hansfbaier

I also noticed this. In my case, only the next clock edge is executed (no more than one extra transition).

The code in the zip is quite straight forward: There are two clocks that go through a mux, and for a couple of cycles the output clock is inactive when switching clocks. At the beginning of this period the "off-by-one" error is clearly visible.

I added markers to the following image:

  • A -> Clock 1 stops
  • B -> Extra transition
  • C -> Clock 2 stops
  • D -> Extra transition

Screenshot_20210928_220100

Same as @hansfbaier , run as python clock_bug_test.py. Tested with nmigen 0.3.dev261+g9f78ac0.

clock_bug_test.zip

andresmanelli avatar Sep 28 '21 20:09 andresmanelli

I have debugged both examples. For the original bug report, the issue comes from the following interaction:

  • the clock signal of sync domain is sync_time_detector.clk_in
  • the clock signal of sync is driven by the simulator by add_clock
  • sync_time_detector.clk_in is also combinationally driven from top.clk_in (a const-0 signal due to never being assigned)
  • the driver conflict is not detected (which is a bug tracked as #557)
  • the two drivers fight unpredictably, with the comb driver winning whenever comb logic in top runs for unrelated reasons, sometimes changing value multiple times in 0 simulation time

The other report (from @andresmanelli) has a completely different failure mode, and is instead caused by the glitchy way in which the ClockMux is written:

  • some time after the selector is switched from clka to clkb, clka edge happens while and_1_reg_3 is still high
  • o_clk immediately goes high (being combinationally set to and_1_reg_3 & clka)
  • clka edge triggers logic in clka domain and makes and_1_reg_3 transition low
  • o_clk immediately goes back low

The above happens all within 0 simulation time, making it invisible in VCD. It can be seen using the recently added fs_per_delta option.

In both cases the solution is to be more careful when driving your clock signals.

wanda-phi avatar Apr 11 '24 10:04 wanda-phi

Thank you for making us aware of fs_per_delta I can see us getting a lot of use out of it

Lunaphied avatar Apr 11 '24 13:04 Lunaphied