compute-runtime icon indicating copy to clipboard operation
compute-runtime copied to clipboard

[DG1] free memory is reported incorrectly by Sysman (no memory used)

Open eero-t opened this issue 3 years ago • 4 comments

Setup:

  • latest public (5.16) "drm-tip" kernel with DG1 force-probed + GuC v69.0.3
  • compute-runtime 21.52.22081 [1]

Use-case:

  • Query memory info using Sysman APIs, or using compute-runtime zello_sysman tester

Result:

  • Memory free is always reported as being the same as total available memory i.e. 0 bytes of memory being used regardless of how much GPU is used
# docker run -it --env ZES_ENABLE_SYSMAN=1 --rm --user root --cap-drop ALL --cap-add SYS_ADMIN --device /dev/dri:/dev/dri:rw --network none registry.fi.intel.com/dgpu-enabling/collectd-gpu-plugin:GIT-2022-01-14 zello_sysman --memory

Device Name = Intel(R) Iris(R) Xe MAX Graphics [0x4905]
Device Name = Intel(R) Iris(R) Xe MAX Graphics [0x4905]

 ----  Memory tests ---- 
Memory Type = ZES_MEM_TYPE_DDR
On Subdevice = 
Subdevice Id = 0
Memory Size = 0
Number of channels = -1
Memory Health = ZES_MEM_HEALTH_OK
The total allocatable memory in bytes = 4219469824
The free memory in bytes = 4219469824
ZE_RESULT_ERROR_UNSUPPORTED_FEATURE returned by zesMemoryGetBandwidth(handle, &memoryBandwidth): testSysmanMemory: 740
Memory Read Counter = 0
Memory Write Counter = 0
Memory Maximum Bandwidth = 0
Memory Timestamp = 0

 ----  Memory tests ---- 
Memory Type = ZES_MEM_TYPE_DDR
On Subdevice = 
Subdevice Id = 0
Memory Size = 0
Number of channels = -1
Memory Health = ZES_MEM_HEALTH_OK
The total allocatable memory in bytes = 4219469824
The free memory in bytes = 4219469824
ZE_RESULT_ERROR_UNSUPPORTED_FEATURE returned by zesMemoryGetBandwidth(handle, &memoryBandwidth): testSysmanMemory: 740
Memory Read Counter = 0
Memory Write Counter = 0
Memory Maximum Bandwidth = 0
Memory Timestamp = 0

[1] Later compute-runtime releases fail to build. I do not have a bug on this as I'm waiting for latest IGC release build issue with compute-runtime to be solved first: https://github.com/intel/intel-graphics-compiler/issues/224

eero-t avatar Jan 19 '22 14:01 eero-t

Thanks for the issue, we are looking internally and update you back

saik-intel avatar Feb 02 '22 05:02 saik-intel

Same issue also with the (last week) drm-tip 5.17-rc4 kernel, GuC 69.0.3 and latest compute-runtime "22.07.22465" release.

eero-t avatar Feb 21 '22 14:02 eero-t

Still happens with the (last week) drm-tip 5.18-rc3 kernel, GuC 70.1.1 and latest compute-runtime "22.16.22992" release.

eero-t avatar Apr 25 '22 14:04 eero-t

Still happens with drm-tip 5.18 kernel, GuC 70.1.1 and latest compute-runtime "22.23.23405" release.

eero-t avatar Jun 14 '22 13:06 eero-t

Tested drm-tip 6.0-rc3 kernel, GuC 70.1.1, and compute-runtime " 22.31.23852", and memory reporting seems to be working now -> closing.

eero-t avatar Sep 01 '22 16:09 eero-t