b-sumner comments

Results 105 comments of


                                            b-sumner

Inline asm translation

Yes, but using OCKL insulates you from any future ISA changes. It will always work.

Query Regarding Fast-Math Optimization Support on AMD GPUs

@yxsamliu it is possible, but of course it would not match the cuda results on nvidia. Will that not be a problem?

Query Regarding Fast-Math Optimization Support on AMD GPUs

@yxsamliu __sinf is a Cuda/HIP function that is implemented with a call to the native sin function, while sin is implemented with a call to the regular OCML sin function....

Query Regarding Fast-Math Optimization Support on AMD GPUs

@yxsamliu exactly. However, I'd like to note that implementing this could break existing applications that are somehow dependent on the higher accuracy.

Query Regarding Fast-Math Optimization Support on AMD GPUs

Sounds reasonable to me. We'll need to be sure to document this change.

[Issue]: Illegal instruction detected: Invalid dpp_ctrl value: broadcasts are not supported on GFX10+

Hi @fromtheeast710 . I'm not sure why this is being reported to ROCm and not the LAMMPS developers who probably have not updated their code to handle GFX10+.

[Issue]: Illegal instruction detected: Invalid dpp_ctrl value: broadcasts are not supported on GFX10+

Hello @fromtheeast710 the compiler is correctly stating that the source code is attempting to use an instruction field that is not supported by gfx1030. This is much preferable to the...

[Issue]: Illegal instruction detected: Invalid dpp_ctrl value: broadcasts are not supported on GFX10+

@fromtheeast710 the compiler treats gfx1030 and gfx1032 as separate targets; they have different names after all. Forcing ISA from one to run on another is risky.

Error compiling ROCR-Runtime: `'amdgpu_code_object_version': IDs have conflicting values`

Those functions were not removed, but their definitions were moved. It still appears something is not consistent in this build.

Correctly set the index value for __shf_up.

@jchlanda, (self & ~(width-1)) is the lowest lane in the group of width lanes that includes self. If index, the source lane, is below that value, then the shuffle up...