pti-gpu icon indicating copy to clipboard operation
pti-gpu copied to clipboard

[Unitrace] too many very long calls to zeCommandListAppendMemory fill on Lunar Lake

Open s-Nick opened this issue 9 months ago • 3 comments

Description

I am trying to profile/trace an application that uses GPU on Lunar Lake. The trace I obtain is unreadable due to very long zeCommandListAppendMemoryFill, they look like they are taking 10 minutes, which is way more than the whole application takes. This doesn't happen on Intel ARC B580

An example is:

Image

Environment

>cat /etc/issue
Ubuntu 24.10 

>dpkg -l | grep intel
ii  intel-fw-gpu                                  2024.37.5-362~22.04                      all          Firmware package for Intel integrated and discrete GPUs\
ii  intel-gpu-tools                               1.29-1                                   amd64        tools for debugging the Intel graphics driver
ii  intel-igc-cm                                  1.0.225.54083-1077~24.04                 amd64        Intel(R) C for Metal Compiler -- CM Frontend lib
ii  intel-media-va-driver-non-free:amd64          25.1.0-0ubuntu1~ppa1                     amd64        VAAPI driver for the Intel GEN8+ Graphics family
ii  intel-metrics-discovery                       1.13.179-1077~24.04                      amd64        Intel(R) Metrics Discovery Application Programming Interface --
ii  intel-metrics-library                         1.0.182-1077~24.04                       amd64        Intel(R) Metrics Library for MDAPI (Intel(R) Metrics Discovery
ii  intel-microcode                               3.20250211.0ubuntu0.24.10.1              amd64        Processor microcode firmware for Intel CPUs
ii  intel-ocloc                                   24.52.32224.14-1077~24.04                amd64        Tool for managing Intel Compute GPU device binary format
ii  intel-opencl-icd                              24.52.32224.14-1077~24.04                amd64        Intel graphics compute runtime for OpenCL
ii  libchewing3:amd64                             0.9.0-1                                  amd64        intelligent phonetic input method library
ii  libchewing3-data                              0.9.0-1                                  all          intelligent phonetic input method library - data files
ii  libdrm-intel1:amd64                           2.4.122-1                                amd64        Userspace interface to intel-specific kernel DRM services -- runtime
ii  libze-intel-gpu1                              24.52.32224.14-1077~24.04                amd64        Intel(R) Graphics Compute Runtime for oneAPI Level Zero.
ii  xserver-xorg-video-intel                      2:2.99.917+git20210115-1build1           amd64        X.Org X server -- Intel i8xx, i9xx display driver

>  sycl-ls
[level_zero:gpu][level_zero:0] Intel(R) oneAPI Unified Runtime over Level-Zero, Intel(R) Arc(TM) Graphics 20.4.4 [1.6.32224+14]

s-Nick avatar Mar 21 '25 09:03 s-Nick

@s-Nick Thank you for reporting this issue. We are triaging it now,

zma2 avatar Mar 25 '25 15:03 zma2

@s-Nick The device timestamps are incorrect on LNL. This issue does not reproduce on Linux or Windows + Arrow Lake.

Image

.

zma2 avatar Apr 15 '25 16:04 zma2

I've confirmed the workaround provided in https://github.com/intel/pti-gpu/pull/93 helps a lot to solve this issue. Some functions can still appear with a much longer time than possible but re-running unitrace or increase the MAX_RETRY from the workaround solves the issue.

Rbiessy avatar Jun 12 '25 14:06 Rbiessy