b-sumner comments

Results 144 comments of


                                            b-sumner

Getting peak Binary32 flop/s on CDNA2 using float2

You could try -ffp-contract=fast. But unfortunately float2 means something in HIP and Cuda other that what it means in OpenCL. So using scalars may be the best approach.

Getting peak Binary32 flop/s on CDNA2 using float2

@etiennemlb would it be possible for you to provide a minimal HIP application that demonstrates the issue?

Getting peak Binary32 flop/s on CDNA2 using float2

Thank you. We now have an internal ticket open for this.

Porting HIP to run on other (non-AMD) accelerators

HIP assumes a SIMT programming model and requires the accelerator work to be expressible as a series of series of launches of up-to 3D arrays of threads/work-items, much like Cuda,...

Porting HIP to run on other (non-AMD) accelerators

Regarding the device libraries, they are AMD specific and assume an AMD runtime. We will not accept changes adding support for other devices or platforms. However they can be, and...

[Feature Request]: atomicAdd() to support half2

Its unsafe because it causes the fast HW instruction to be generated, but those instructions don't work if they act on memory that is not cached, e.g. across a PCIe...

[Issue]: Conversion of tiny-cuda-nn lib into HIP

@Vishal-S-P somehow the HIP compiler is seeing that inline PTX at line 329 of vec.h and that certainly won't work. Apparently the guard "#if TCNN_MIN_GPU_ARCH >= 70" is somehow passing....

[Feature]: JIT compilation

@DemiMarie this is a frequently requested feature and we are exploring approaches that should help.

program execution hangs

@zjin-lcf can you compile with -g and crank up rocgdb and tell us where it hangs? There are many hardware differences between AMD and nvidia GPUs. Also those links are...

FR: generate MUL:2, MUL:4, DIV:2 for VOP3 instructions (OpenCL performance)

clang/LLVM changes to enable this are now upstream. The compiler options -mno-amdgpu-ieee and -fno-honor-nans are both required to enable such folding.