libsystemctlm-soc icon indicating copy to clipboard operation
libsystemctlm-soc copied to clipboard

refdesign-sim demo fails on Error: SIGNALS:: Unable to connect top.pcie_bridge.signals-master-tieoff_0.awvalid

Open gricardo99 opened this issue 2 years ago • 7 comments

Hello, I'm trying to run the refdesign-sim demo, connecting to the Xilinx QEMU VM. The QEMU is running and waiting for connection:

(qemu) device_add remote-port-pci-adaptor,bus=rootport1,id=rp0
Failed to connect to 'machine-x86/qemu-rport-_machine_peripheral_rp0_rp': Connection refused
info: QEMU waiting for connection on: disconnected:unix:machine-x86/qemu-rport-_machine_peripheral_rp0_rp,server=on

The host is: g++ --version g++ (Ubuntu 11.3.0-1ubuntu1~22.04.1) 11.3.0

 > verilator --version
Verilator 4.038 2020-07-11 rev v4.036-114-g0cd4a57ad

SystemC 2.3.3

 > ldd tests/rtl-bridges/pcie/refdesign-sim
        linux-vdso.so.1 (0x00007ffd663da000)
        libsystemc-2.3.3.so => /lib/x86_64-linux-gnu/libsystemc-2.3.3.so (0x00007fc0d02df000)

I get a strange error when running refdesign-sim on the host:

> sudo ./refdesign-sim unix:machine-x86/qemu-rport-_machine_peripheral_rp0_rp 1000
        SystemC 2.3.3-Accellera --- Mar 17 2022 13:55:26
        Copyright (c) 1996-2018 by all Contributors,
        ALL RIGHTS RESERVED

Error: SIGNALS:: Unable to connect top.pcie_bridge.signals-master-tieoff_0.awvalid
In file: ../../test-modules/signals-common.h:97

After some debug it appears this is from this line: refdesign-sim.cc

108                         snprintf(pname, sizeof(pname) - 1, "m_axi_usr_%d_", bi);
109                         signals_m_tieoff[i].connect(ep_bridge, pname);

Where the verilated ep_bridge module (Vpcie_ep) seems to only return some generic port names for its child objects, which leads to this error.
For example, I print all child object names returned in the signal_find_child function, and the names are all of the form: port_0 port_1 port_2 etc...

Looking at the Vpcie_ep.cpp/h files, and the module appears to have the correct SC ports, which should match with the refdesign-sim.cc code, if ep_bridge children returned the names of the SC module port variables. Example: tests/rtl-bridges/pcie/obj_dir/Vpcie_ep.h, these should match the connections made in refdesign-sim.cc

   37     // PORTS
   38     // The application code writes and reads these signals to
   39     // propagate new values into/out from the Verilated model.
   40     sc_in<bool> clk;
   41     sc_in<bool> resetn;
   42     sc_out<bool> usr_resetn;
   43     sc_in<bool> s_axi_pcie_m0_awvalid;
   44     sc_out<bool> s_axi_pcie_m0_awready;
   45     sc_in<bool> s_axi_pcie_m0_wvalid;
   46     sc_out<bool> s_axi_pcie_m0_wready;
   47     sc_out<bool> s_axi_pcie_m0_bvalid;

Any ideas why the child names in ep_bridge are not matching, and seem to be generic port+number, such as 'port_0' etc..?

gricardo99 avatar Jun 02 '23 22:06 gricardo99

Just an update. I was able to get passed the refdesign-sim errors that I reported in my initial post, but my workaround suggests there's something wrong with my setup/steps, but I'm not sure what exactly.

After playing around with SystemC (apologies, but I'm a systemC noob), I realized that any sc_in/out will default to the generic names that I was seeing (i.e. port_0, port_1, etc...). As a side-note, I also moved to using SystemC 2.3.2, since that was in the instructions, and I wanted to rule that out as a cause (I hit the same initial refdesign-sim error with both 2.3.3 and 2.3.2).

Looking at the verilated ep_bridge module (tests/rtl-bridges/pcie/obj_dir/Vpcie_ep.h), I could see that the ports do not have names, and the default SC_CTOR constructor is being used. I hacked up Vpcie_ep.h to comment-out the default SC_CTOR constructor, and add the code to include the port constructors that pass in the port names.
E.g.: obj_dir/Vpcie_ep.h

12925   public:
12926    // SC_CTOR(Vpcie_ep);
12927     typedef Vpcie_ep SC_CURRENT_USER_MODULE;
12928     Vpcie_ep( ::sc_core::sc_module_name ) :
12929      s_axi_pcie_m0_awvalid("s_axi_pcie_m0_awvalid"),
12930      s_axi_pcie_m0_awready("s_axi_pcie_m0_awready"),
12931      s_axi_pcie_m0_wvalid("s_axi_pcie_m0_wvalid"),
12932      s_axi_pcie_m0_wready("s_axi_pcie_m0_wready"),
12933      s_axi_pcie_m0_bvalid("s_axi_pcie_m0_bvalid"),
12934      s_axi_pcie_m0_bready("s_axi_pcie_m0_bready"),
                    ...etc...
14223       s_axi_usr_5_ruser("s_axi_usr_5_ruser"),
14224       s_axi_usr_5_wid("s_axi_usr_5_wid")
14225          { };
14226     virtual ~Vpcie_ep();

Unfortunately re-running make after obj_dir/Vpcie_ep.h source code changes was not possible, since this rebuilds the verilated source code Vpcie_ep.h/.cc, and clobbers any changes.

Thus, I manually rebuild refdesign-sim by calling each relevant compile/link step separately, to pick up my manual verilated source code changes:

  1. recompiling: obj_dir/Vpcie_ep.o

  2. relinking all the obj_dir/Vpcie_ep*.o files into the static lib: Vpcie_ep__ALL.a

  3. recompiling refdesign-sim.o

  4. recompiling refdesign-sim

    Luckily the Make steps do print out each of the compile/link steps such that I could rerun my above steps after modifying the verilated source code.

After this, I'm able to connect refdesign-sim to the Qemu VM:

> sudo ./refdesign-sim unix:/nis/asic/us_dump2/ricardga/temp/qemu_playground/machine-x86/qemu-rport-_machine_peripheral_rp0_rp 1000

        SystemC 2.3.2-Accellera --- Jun  2 2023 16:03:03
        Copyright (c) 1996-2017 by all Contributors,
        ALL RIGHTS RESERVED

Info: (I702) default timescale unit used for tracing: 1 ps (./refdesign-sim.vcd)
connect to /nis/asic/us_dump2/ricardga/temp/qemu_playground/machine-x86/qemu-rport-_machine_peripheral_rp0_rp

And on the Qemu, after the device_add commands, I can see this device:

01:00.0 Serial controller: Xilinx Corporation Device d004 (rev 12) (prog-if 01 [16450])
	Subsystem: Red Hat, Inc. Device 1100
	Physical Slot: 0
	Control: I/O- Mem+ BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
	Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
	Interrupt: pin A routed to IRQ 23
	Region 0: Memory at fe800000 (32-bit, non-prefetchable) [size=1M]
	Capabilities: [40] Express (v2) Endpoint, MSI 00
		DevCap:	MaxPayload 128 bytes, PhantFunc 0, Latency L0s <64ns, L1 <1us
			ExtTag- AttnBtn- AttnInd- PwrInd- RBE+ FLReset- SlotPowerLimit 0.000W
		DevCtl:	CorrErr- NonFatalErr- FatalErr- UnsupReq-
			RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop-
			MaxPayload 128 bytes, MaxReadReq 128 bytes
		DevSta:	CorrErr- NonFatalErr- FatalErr- UnsupReq- AuxPwr- TransPend-
		LnkCap:	Port #0, Speed 2.5GT/s, Width x1, ASPM L0s, Exit Latency L0s <64ns
			ClockPM- Surprise- LLActRep- BwNot- ASPMOptComp-
		LnkCtl:	ASPM Disabled; RCB 64 bytes Disabled- CommClk-
			ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
		LnkSta:	Speed 2.5GT/s (ok), Width x1 (ok)
			TrErr- Train- SlotClk- DLActive- BWMgmt- ABWMgmt-
		DevCap2: Completion Timeout: Not Supported, TimeoutDis-, NROPrPrP-, LTR-
			 10BitTagComp-, 10BitTagReq-, OBFF Not Supported, ExtFmt+, EETLPPrefix+, MaxEETLPPrefixes 4
			 EmergencyPowerReduction Not Supported, EmergencyPowerReductionInit-
			 FRS-, TPHComp-, ExtTPHComp-
			 AtomicOpsCap: 32bit- 64bit- 128bitCAS-
		DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-, LTR-, OBFF Disabled
			 AtomicOpsCtl: ReqEn-
		LnkCtl2: Target Link Speed: 2.5GT/s, EnterCompliance- SpeedDis-
			 Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-
			 Compliance De-emphasis: -6dB
		LnkSta2: Current De-emphasis Level: -6dB, EqualizationComplete-, EqualizationPhase1-
			 EqualizationPhase2-, EqualizationPhase3-, LinkEqualizationRequest-
	Kernel driver in use: vfio-pci

Note that this also shows "Kernel driver in use: vfio-pci", after the refdesign-sim steps on the Qemu guest:

$ sudo modprobe vfio-pci nointxmask=1
$ sudo sh -c 'echo 10ee d004 > /sys/bus/pci/drivers/vfio-pci/new_id'

However, when I go to run the test on the Qemu guest, the test hangs:

> ls -l /sys/bus/pci/devices/0000\:01\:00.0/iommu_group
lrwxrwxrwx 1 root root 0 Jun  7 20:35 /sys/bus/pci/devices/0000:01:00.0/iommu_group -> ../../../../kernel/iommu_groups/3
ubuntu@ubuntu:~/Downloads/github/libsystemctlm-soc/tests/rtl-bridges/pcie$ sudo ./test-pcie-ep-master-vfio 0000:01:00.0 3 0

        SystemC 2.3.2-Accellera --- Jun  2 2023 18:34:05
        Copyright (c) 1996-2017 by all Contributors,
        ALL RIGHTS RESERVED
Device supports 9 regions, 5 irqs
mapped 0 at 0x7f9ec018c000

Info: (I702) default timescale unit used for t

I get no further output after this. I've tried a few different times, including after adding ssh port forwarding to the qemu guest and running from an ssh/terminal (in case the stdio/serial port was somehow causing some issue).

I do see refdesign-sim.vcd file increasing in size, since launching it. The Qemu quest just hangs and is non-responsive at this point. I can however Ctrl-C refdesign-sim and that seems to crash/kill the Qemu VM. After that I see:

(qemu) qemu-system-x86_64: /machine/peripheral/rp0/rp: Disconnected clk=144537801101 ns

And qemu VM dies.

So I'm hoping someone my have some clues/insights into what is going wrong for me here, or some debug tips. As I mentioned, the fact that I had to hack a workaround for the initial refdeisgn-sim error suggests to me that I must have something setup/configured wrong, or I'm missing a step.

Thanks!

gricardo99 avatar Jun 07 '23 21:06 gricardo99

Hi @gricardo99 ,

I got the same error. Have you got a chance to fix or workaround it ?

Best regards :)

Kevin

kevinyuan avatar Nov 04 '23 00:11 kevinyuan

Hi @kevinyuan
What error did you hit? My initial compile error, or the subsequent hang (after my compile error workaround)? Unfortunately no, I never resolved these issues with the demo. I had to move on to other things, so I haven't looked at this in a while. It would be great if someone from the project, maybe @edgarigl or @franciscoIglesias could take a look? Hopefully it's something very simple with the setup.

gricardo99 avatar Nov 07 '23 17:11 gricardo99

Hi @gricardo99 ,

I found this error happens with verilator from Ubuntu repo, i.g. install via "apt install ...";

I tried to compile systemc + verilator from source, then the connection() error disappeared and double-checked by looking into debug message , i.e. there 's no port_0/1/2 anymore.

However, another error happened with tye dynamic_cast<> which try to cast sc_object to sc_out.

kevinyuan avatar Nov 07 '23 18:11 kevinyuan

Hi @gricardo99 ,

I hit this error initially:

Error: SIGNALS:: Unable to connect top.pcie_bridge.signals-master-tieoff_0.awvalid

kevinyuan avatar Nov 07 '23 18:11 kevinyuan

@gricardo99 Have you tried to install Verilator from source? Precompiled Verilator installation might conflict with gcc when compiling SystemC. You need to use the same gcc/g++ version for compiling Verilator and your project's SystemC code.

wilberZen avatar Sep 29 '24 06:09 wilberZen

I also encountered this error while using Verilator v4.038 on Ubuntu 22.04.5.

Upgrading to Verilator v4.228 resolved the issue. Recompiling refdesign-sim is needed.

luoguojie avatar Oct 09 '24 09:10 luoguojie