HW path reports error
I'm unable to use the HW path for mem move even after configuring the DSA devices:
$ sudo ./hl_mem_move_example hardware_path
Executing using dml::hardware path
Starting dml::mem_move example...
Copy 1KB of data from source into destination...
dml-diag: DML version TODO
dml-diag: Struct size: 3328 B
dml-diag: loading driver: libaccel-config.so.1
Failure occurred.
When manually calling dml::memmove, I get error code 16 that corresponds to internal library error. Is there a way to debug this? Any help would be really appreciated. Thanks!
System Configuration
Processor: Intel(R) Xeon(R) Silver 4416+
I have configured DSA using the python script:
$ sudo python3 accel_conf.py --load=../configs/1n1d1e1w-s-n1.conf
Filter:
Disabling active devices
dsa0 - done
Loading configuration - done
Additional configuration steps
Force block on fault: False
Enabling configured devices
dsa0 - done
wq0.0 - done
Checking configuration
node: 0; device: dsa0; group: group0.0
wqs: wq0.0
engines: engine0.0
I'm also running relatively recent kernel version:
$ uname -a
Linux machinename 6.8.0-rc7 #1 SMP PREEMPT_DYNAMIC Thu Mar 7 11:11:46 PST 2024 x86_64 x86_64 x86_64 GNU/Linux
Kernel cmdline:
$ cat /proc/cmdline
BOOT_IMAGE=/boot/vmlinuz-6.8.0-rc7 root=UUID=4f739d8f-4f15-4fc3-b419-bbb0202131b3 ro splash earlyprintk=ttyS1,115200 console=ttyS1,115200 c
onsole=ttyS0,115200 memmap=8G!16G nokaslr movable_node=2 intel_iommu=on,sm_on iommu=on vt.handoff=7
lspci output for one of the two devices available:
$ sudo lspci -vvv -s 75:01.0
75:01.0 System peripheral: Intel Corporation Device 0b25
Subsystem: Intel Corporation Device 0000
Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr+ Stepping- SERR+ FastB2B- DisINTx-
Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
Latency: 0, Cache Line Size: 64 bytes
NUMA node: 0
IOMMU group: 1
Region 0: Memory at 21bffff50000 (64-bit, prefetchable) [size=64K]
Region 2: Memory at 21bffff20000 (64-bit, prefetchable) [size=128K]
Capabilities: [40] Express (v2) Root Complex Integrated Endpoint, MSI 00
DevCap: MaxPayload 512 bytes, PhantFunc 0
ExtTag+ RBE+ FLReset+
DevCtl: CorrErr- NonFatalErr- FatalErr+ UnsupReq-
RlxdOrd+ ExtTag+ PhantFunc- AuxPwr- NoSnoop+ FLReset-
MaxPayload 512 bytes, MaxReadReq 512 bytes
DevSta: CorrErr- NonFatalErr- FatalErr- UnsupReq- AuxPwr- TransPend-
DevCap2: Completion Timeout: Not Supported, TimeoutDis+ NROPrPrP- LTR+
10BitTagComp+ 10BitTagReq+ OBFF Not Supported, ExtFmt+ EETLPPrefix+, MaxEETLPPrefixes 1
EmergencyPowerReduction Not Supported, EmergencyPowerReductionInit-
FRS-
AtomicOpsCap: 32bit- 64bit- 128bitCAS-
DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis+ LTR- OBFF Disabled,
AtomicOpsCtl: ReqEn-
Capabilities: [80] MSI-X: Enable+ Count=9 Masked-
Vector table: BAR=0 offset=00002000
PBA: BAR=0 offset=00003000
Capabilities: [90] Power Management version 3
Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-)
Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME-
Capabilities: [100 v2] Advanced Error Reporting
UESta: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
UEMsk: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq+ ACSViol-
UESvrt: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP+ ECRC- UnsupReq- ACSViol-
CESta: RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr-
CEMsk: RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr-
AERCap: First Error Pointer: 00, ECRCGenCap- ECRCGenEn- ECRCChkCap- ECRCChkEn-
MultHdrRecCap- MultHdrRecEn- TLPPfxPres- HdrLogCap-
HeaderLog: 00000000 00000000 00000000 00000000
Capabilities: [150 v1] Latency Tolerance Reporting
Max snoop latency: 0ns
Max no snoop latency: 0ns
Capabilities: [160 v1] Transaction Processing Hints
Device specific mode supported
Steering table in TPH capability structure
Capabilities: [170 v1] Virtual Channel
Caps: LPEVC=1 RefClk=100ns PATEntryBits=1
Arb: Fixed+ WRR32- WRR64- WRR128-
Ctrl: ArbSelect=Fixed
Status: InProgress-
VC0: Caps: PATOffset=00 MaxTimeSlots=1 RejSnoopTrans-
Arb: Fixed- WRR32- WRR64- WRR128- TWRR128- WRR256-
Ctrl: Enable+ ID=0 ArbSelect=Fixed TC/VC=01
Status: NegoPending- InProgress-
VC1: Caps: PATOffset=00 MaxTimeSlots=1 RejSnoopTrans-
Arb: Fixed- WRR32- WRR64- WRR128- TWRR128- WRR256-
Ctrl: Enable+ ID=1 ArbSelect=Fixed TC/VC=02
Status: NegoPending- InProgress-
Capabilities: [200 v1] Designated Vendor-Specific: Vendor=8086 ID=0005 Rev=0 Len=24 <?>
Capabilities: [220 v1] Address Translation Service (ATS)
ATSCap: Invalidate Queue Depth: 00
ATSCtl: Enable+, Smallest Translation Unit: 00
Capabilities: [230 v1] Process Address Space ID (PASID)
PASIDCap: Exec- Priv+, Max PASID Width: 14
PASIDCtl: Enable+ Exec- Priv+
Capabilities: [240 v1] Page Request Interface (PRI)
PRICtl: Enable+ Reset-
PRISta: RF- UPRGI- Stopped+
Page Request Capacity: 00000200, Page Request Allocation: 00000200
Kernel driver in use: idxd
Kernel modules: idxd
Hi @suyashmahar,
In the examples/high-level-api/mem_move_example.cpp, could you please also print out result.status right before "Failure occurred" message?
Hi @mzhukova,
I got 16:
Hi @mzhukova , are there any env flags / build configuration I can use to debug this issue? Thanks for the help!
@mzhukova, I think I found the reason. If DML cannot find libaccel-config.so, it just reports an internal error. I confirmed this using strace.
Any HW initialization failure in this code is reported as a generic failure. If the "if" condition fails.
https://github.com/intel/DML/blob/8224bea9d8ba01bad98dc2022b7db98b3ccd38ff/sources/core/src/hardware_device.cpp#L42-L68
This is where the library tries to load libaccel-config.so
https://github.com/intel/DML/blob/8224bea9d8ba01bad98dc2022b7db98b3ccd38ff/sources/core/src/hw_dispatcher/hw_dispatcher.cpp#L45
If I make sure that libaccel-config.so is accessible, the hardware_path example works.
Sorry for the delayed response @suyashmahar. I'm glad that you were able to find the root cause of the failure. We will work on improving the status reporting in one of the future releases.