compute-runtime
compute-runtime copied to clipboard
[DG1] free memory is reported incorrectly by Sysman (no memory used)
Setup:
- latest public (5.16) "drm-tip" kernel with DG1 force-probed + GuC v69.0.3
- compute-runtime 21.52.22081 [1]
Use-case:
- Query memory info using Sysman APIs, or using compute-runtime zello_sysman tester
Result:
- Memory free is always reported as being the same as total available memory i.e. 0 bytes of memory being used regardless of how much GPU is used
# docker run -it --env ZES_ENABLE_SYSMAN=1 --rm --user root --cap-drop ALL --cap-add SYS_ADMIN --device /dev/dri:/dev/dri:rw --network none registry.fi.intel.com/dgpu-enabling/collectd-gpu-plugin:GIT-2022-01-14 zello_sysman --memory
Device Name = Intel(R) Iris(R) Xe MAX Graphics [0x4905]
Device Name = Intel(R) Iris(R) Xe MAX Graphics [0x4905]
---- Memory tests ----
Memory Type = ZES_MEM_TYPE_DDR
On Subdevice =
Subdevice Id = 0
Memory Size = 0
Number of channels = -1
Memory Health = ZES_MEM_HEALTH_OK
The total allocatable memory in bytes = 4219469824
The free memory in bytes = 4219469824
ZE_RESULT_ERROR_UNSUPPORTED_FEATURE returned by zesMemoryGetBandwidth(handle, &memoryBandwidth): testSysmanMemory: 740
Memory Read Counter = 0
Memory Write Counter = 0
Memory Maximum Bandwidth = 0
Memory Timestamp = 0
---- Memory tests ----
Memory Type = ZES_MEM_TYPE_DDR
On Subdevice =
Subdevice Id = 0
Memory Size = 0
Number of channels = -1
Memory Health = ZES_MEM_HEALTH_OK
The total allocatable memory in bytes = 4219469824
The free memory in bytes = 4219469824
ZE_RESULT_ERROR_UNSUPPORTED_FEATURE returned by zesMemoryGetBandwidth(handle, &memoryBandwidth): testSysmanMemory: 740
Memory Read Counter = 0
Memory Write Counter = 0
Memory Maximum Bandwidth = 0
Memory Timestamp = 0
[1] Later compute-runtime releases fail to build. I do not have a bug on this as I'm waiting for latest IGC release build issue with compute-runtime to be solved first: https://github.com/intel/intel-graphics-compiler/issues/224
Thanks for the issue, we are looking internally and update you back
Same issue also with the (last week) drm-tip 5.17-rc4 kernel, GuC 69.0.3 and latest compute-runtime "22.07.22465" release.
Still happens with the (last week) drm-tip 5.18-rc3 kernel, GuC 70.1.1 and latest compute-runtime "22.16.22992" release.
Still happens with drm-tip 5.18 kernel, GuC 70.1.1 and latest compute-runtime "22.23.23405" release.
Tested drm-tip 6.0-rc3 kernel, GuC 70.1.1, and compute-runtime " 22.31.23852", and memory reporting seems to be working now -> closing.