DML icon indicating copy to clipboard operation
DML copied to clipboard

HW path reports error

Open suyashmahar opened this issue 1 year ago • 5 comments

I'm unable to use the HW path for mem move even after configuring the DSA devices:

$ sudo ./hl_mem_move_example hardware_path
Executing using dml::hardware path
Starting dml::mem_move example...
Copy 1KB of data from source into destination...
dml-diag: DML version TODO
dml-diag: Struct size: 3328 B
dml-diag: loading driver: libaccel-config.so.1
Failure occurred.

When manually calling dml::memmove, I get error code 16 that corresponds to internal library error. Is there a way to debug this? Any help would be really appreciated. Thanks!

System Configuration

Processor: Intel(R) Xeon(R) Silver 4416+

I have configured DSA using the python script:

$ sudo python3 accel_conf.py --load=../configs/1n1d1e1w-s-n1.conf
Filter:
Disabling active devices
    dsa0 - done
Loading configuration - done
Additional configuration steps
    Force block on fault: False
Enabling configured devices
    dsa0 - done
        wq0.0 - done
Checking configuration
    node: 0; device: dsa0; group: group0.0
        wqs:     wq0.0
        engines: engine0.0

I'm also running relatively recent kernel version:

$  uname -a
Linux machinename 6.8.0-rc7 #1 SMP PREEMPT_DYNAMIC Thu Mar  7 11:11:46 PST 2024 x86_64 x86_64 x86_64 GNU/Linux

Kernel cmdline:

$ cat /proc/cmdline
BOOT_IMAGE=/boot/vmlinuz-6.8.0-rc7 root=UUID=4f739d8f-4f15-4fc3-b419-bbb0202131b3 ro splash earlyprintk=ttyS1,115200 console=ttyS1,115200 c
onsole=ttyS0,115200 memmap=8G!16G nokaslr movable_node=2 intel_iommu=on,sm_on iommu=on vt.handoff=7

lspci output for one of the two devices available:

$ sudo lspci -vvv -s 75:01.0
75:01.0 System peripheral: Intel Corporation Device 0b25
        Subsystem: Intel Corporation Device 0000
        Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr+ Stepping- SERR+ FastB2B- DisINTx-
        Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
        Latency: 0, Cache Line Size: 64 bytes
        NUMA node: 0
        IOMMU group: 1
        Region 0: Memory at 21bffff50000 (64-bit, prefetchable) [size=64K]
        Region 2: Memory at 21bffff20000 (64-bit, prefetchable) [size=128K]
        Capabilities: [40] Express (v2) Root Complex Integrated Endpoint, MSI 00
                DevCap: MaxPayload 512 bytes, PhantFunc 0
                        ExtTag+ RBE+ FLReset+
                DevCtl: CorrErr- NonFatalErr- FatalErr+ UnsupReq-
                        RlxdOrd+ ExtTag+ PhantFunc- AuxPwr- NoSnoop+ FLReset-
                        MaxPayload 512 bytes, MaxReadReq 512 bytes
                DevSta: CorrErr- NonFatalErr- FatalErr- UnsupReq- AuxPwr- TransPend-
                DevCap2: Completion Timeout: Not Supported, TimeoutDis+ NROPrPrP- LTR+
                         10BitTagComp+ 10BitTagReq+ OBFF Not Supported, ExtFmt+ EETLPPrefix+, MaxEETLPPrefixes 1
                         EmergencyPowerReduction Not Supported, EmergencyPowerReductionInit-
                         FRS-
                         AtomicOpsCap: 32bit- 64bit- 128bitCAS-
                DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis+ LTR- OBFF Disabled,
                         AtomicOpsCtl: ReqEn-
        Capabilities: [80] MSI-X: Enable+ Count=9 Masked-
                Vector table: BAR=0 offset=00002000
                PBA: BAR=0 offset=00003000
        Capabilities: [90] Power Management version 3
                Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-)
                Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME-
        Capabilities: [100 v2] Advanced Error Reporting
                UESta:  DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
                UEMsk:  DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq+ ACSViol-
                UESvrt: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP+ ECRC- UnsupReq- ACSViol-
                CESta:  RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr-
                CEMsk:  RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr-
                AERCap: First Error Pointer: 00, ECRCGenCap- ECRCGenEn- ECRCChkCap- ECRCChkEn-
                        MultHdrRecCap- MultHdrRecEn- TLPPfxPres- HdrLogCap-
                HeaderLog: 00000000 00000000 00000000 00000000
        Capabilities: [150 v1] Latency Tolerance Reporting
                Max snoop latency: 0ns
                Max no snoop latency: 0ns
        Capabilities: [160 v1] Transaction Processing Hints
                Device specific mode supported
                Steering table in TPH capability structure
        Capabilities: [170 v1] Virtual Channel
                Caps:   LPEVC=1 RefClk=100ns PATEntryBits=1
                Arb:    Fixed+ WRR32- WRR64- WRR128-
                Ctrl:   ArbSelect=Fixed
                Status: InProgress-
                VC0:    Caps:   PATOffset=00 MaxTimeSlots=1 RejSnoopTrans-
                        Arb:    Fixed- WRR32- WRR64- WRR128- TWRR128- WRR256-
                        Ctrl:   Enable+ ID=0 ArbSelect=Fixed TC/VC=01
                        Status: NegoPending- InProgress-
                VC1:    Caps:   PATOffset=00 MaxTimeSlots=1 RejSnoopTrans-
                        Arb:    Fixed- WRR32- WRR64- WRR128- TWRR128- WRR256-
                        Ctrl:   Enable+ ID=1 ArbSelect=Fixed TC/VC=02
                        Status: NegoPending- InProgress-
        Capabilities: [200 v1] Designated Vendor-Specific: Vendor=8086 ID=0005 Rev=0 Len=24 <?>
        Capabilities: [220 v1] Address Translation Service (ATS)
                ATSCap: Invalidate Queue Depth: 00
                ATSCtl: Enable+, Smallest Translation Unit: 00
        Capabilities: [230 v1] Process Address Space ID (PASID)
                PASIDCap: Exec- Priv+, Max PASID Width: 14
                PASIDCtl: Enable+ Exec- Priv+
        Capabilities: [240 v1] Page Request Interface (PRI)
                PRICtl: Enable+ Reset-
                PRISta: RF- UPRGI- Stopped+
                Page Request Capacity: 00000200, Page Request Allocation: 00000200
        Kernel driver in use: idxd
        Kernel modules: idxd

suyashmahar avatar May 06 '24 21:05 suyashmahar

Hi @suyashmahar, In the examples/high-level-api/mem_move_example.cpp, could you please also print out result.status right before "Failure occurred" message?

mzhukova avatar May 06 '24 21:05 mzhukova

Hi @mzhukova, I got 16: image

suyashmahar avatar May 06 '24 22:05 suyashmahar

Hi @mzhukova , are there any env flags / build configuration I can use to debug this issue? Thanks for the help!

suyashmahar avatar May 10 '24 17:05 suyashmahar

@mzhukova, I think I found the reason. If DML cannot find libaccel-config.so, it just reports an internal error. I confirmed this using strace.

Any HW initialization failure in this code is reported as a generic failure. If the "if" condition fails.

https://github.com/intel/DML/blob/8224bea9d8ba01bad98dc2022b7db98b3ccd38ff/sources/core/src/hardware_device.cpp#L42-L68

This is where the library tries to load libaccel-config.so

https://github.com/intel/DML/blob/8224bea9d8ba01bad98dc2022b7db98b3ccd38ff/sources/core/src/hw_dispatcher/hw_dispatcher.cpp#L45

If I make sure that libaccel-config.so is accessible, the hardware_path example works.

suyashmahar avatar May 13 '24 06:05 suyashmahar

Sorry for the delayed response @suyashmahar. I'm glad that you were able to find the root cause of the failure. We will work on improving the status reporting in one of the future releases.

mzhukova avatar May 13 '24 23:05 mzhukova