Felix Kuehling

Results 90 comments of Felix Kuehling

Looks like you're using the amdgpu driver built into your 5.15 kernel. Are you sure the kernel includes the fix?

If you're logged in remotely, make sure your user account is in the "video" group. Otherwise you don't have access to required graphics driver device nodes. This is currently handled...

The test tries to allocate a maximum amount of system memory for GPU access. It looks like it ends up invoking the OOM killer. The log snippet in your report...

If CreateQueueStressSingleThreaded causes a crash, the problem is probably that graphics command submissions are timing out because compute is causing too much stress. CreateQueueStressSingleThreaded doesn't use a lot of the...

KFD is technically part of amdgpu.ko. KFD shares the GPU compute resources and VRAM with graphics. So it is possible that using the GPU for compute affects graphics usage negatively....

It should be on the rocm-5.7.x branch. E.g. https://github.com/RadeonOpenCompute/ROCT-Thunk-Interface/blob/5268ea80e32a6f09f911114e12a20a8176055163/src/events.c#L220 and https://github.com/RadeonOpenCompute/ROCT-Thunk-Interface/blob/5268ea80e32a6f09f911114e12a20a8176055163/src/memory.c#L598. The AsanHeaderPage symbols are only compiled when you build for address sanitizer enabled.

I'm a bit surprised that anyone is still building the Thunk as a shared library. We made it a static library for a reason: It has only a single user...

The Thunk API always had trouble with multiple clients in the same process. When we made a static library we just made a choice that it is not useful as...

ROCr APIs tend to be a bit more abstracted. But all the functionality should be there.

KFD doesn't do anything with the percentage in practice, except 0% means, the queue is disabled, anything else means it is enabled. When I say ROCr exposes all the functionality,...