b-sumner

Results 144 comments of b-sumner

You could try -ffp-contract=fast. But unfortunately float2 means something in HIP and Cuda other that what it means in OpenCL. So using scalars may be the best approach.

@etiennemlb would it be possible for you to provide a minimal HIP application that demonstrates the issue?

Thank you. We now have an internal ticket open for this.

HIP assumes a SIMT programming model and requires the accelerator work to be expressible as a series of series of launches of up-to 3D arrays of threads/work-items, much like Cuda,...

Regarding the device libraries, they are AMD specific and assume an AMD runtime. We will not accept changes adding support for other devices or platforms. However they can be, and...

Its unsafe because it causes the fast HW instruction to be generated, but those instructions don't work if they act on memory that is not cached, e.g. across a PCIe...

@Vishal-S-P somehow the HIP compiler is seeing that inline PTX at line 329 of vec.h and that certainly won't work. Apparently the guard "#if TCNN_MIN_GPU_ARCH >= 70" is somehow passing....

@DemiMarie this is a frequently requested feature and we are exploring approaches that should help.

@zjin-lcf can you compile with -g and crank up rocgdb and tell us where it hangs? There are many hardware differences between AMD and nvidia GPUs. Also those links are...

clang/LLVM changes to enable this are now upstream. The compiler options -mno-amdgpu-ieee and -fno-honor-nans are both required to enable such folding.