Andrey Alekseenko
Andrey Alekseenko
> Also, it can look like the host accessor construction waits for the second kernel just because GPU serializes kernel execution and memory copy. That's correct, but I don't see...
The issue still persists: Using cgh::copy, the submission latency is even across all platforms: ``` $ clang++ -Wno-deprecated-declarations -fsycl-targets=nvptx64-nvidia-cuda,spir64 -fsycl sycl_opencl_scheduling.cpp -O2 -g -o test $ SYCL_DEVICE_FILTER=level_zero:gpu ./test Buffer reset...
> What if the accessor `auto gm_data = buf.get_access(cgh);`, line 26, is created in the in the command group scope, outside of the kernel, line 51 in your code before...
The problem persists with oneAPI DPC++/C++ Compiler 2022.1.0, compute-runtime 22.29.23750. Note: It seems `-fsycl-dead-args-optimization` is now enabled for optimization levels `-O2` and `-O3`, so this bug is only triggered with...
The problem still reproduces on the same machine with cc03176dc3c938aa9fef808d57471d540b69931f and ROCm 4.5.2 and ROCm 5.0.2 but is much rarer with the latter.
On a different machine with gfx1032 and ROCm 5.2.0 does **not** reproduce. On the original one with gfx906: will need some time to do the update.
The original machine, 1c3d598 (2022-10-06), ROCm 5.3.0, gfx906, kernel 5.15.0-48, does not reproduce anymore. I guess either the ROCm upgrade or the kernel upgrade did the trick.
@zjin-lcf AMD RDNA architecture also supports 32-wide execution. EDIT: But I specifically am more concerned with the function not throwing rather than querying the support for Wave32.
> Do I understand correctly, that notion of warps in HIP is the same as in CUDA and it matches the subgroups notion in SYCL? It's called "wavefronts" in AMD...
> @al42and , could you, please, clarify if you encounter exception throw with AMD GPU and HIP backend or OpenCL backend? @s-kanaev, it is with OpenCL backend, as mentioned in...