Julian Samaroo
Julian Samaroo
Ok, CompatHelper already did that for us, sorry. Will merge some of them.
I believe this has been resolved.
> and we probably should not be breaking GPUCompiler willy nilly these days If type inference is now thread-safe, then we shouldn't need to be accessing this lock, we only...
Closing as this seems to be fixed in #306; thanks for looking into this @torrance !
I think this should be fine to merge, since we've resolved the above issues with compat bounds on dependencies.
This bug is specific to the HSA regions API; the AMD memory pools extension API shows the correct set of memory regions (for global coarsegrained and group only) per-device.
Also, I don't do `#ifdef _GNU_SOURCE` because it seems to be defined even on musl.
@kzhuravl what version of ROCm are you using to test this? I remember when testing this that I wasn't able to query the `.kd` symbols by name either; if that...
Follow-up question on this topic: how does one efficiently wait on a host-written signal from the device? Currently, a sleep-check loop causes such high GPU utilization that subsequent `hsa_executable_freeze` calls...
@b-sumner yes, and it isn't just `hsa_executable_freeze`; the progress of kernels submitted (on the same queue) after the high-load kernel has started can basically just come to a halt. I've...