b-sumner comments

Results 144 comments of


                                            b-sumner

Difference between hip_bfloat16 and __hip_bfloat16?

@pcmoritz hip_bfloat16.h appeared before cuda_bf16.h and contains what was needed at the time. hip_bf16.h is meant to match cuda_bf16.h, and is probably the header of choice at this point. hip_bfloat16.h...

Difference between hip_bfloat16 and __hip_bfloat16?

Thanks, I've forwarded your comments.

Difference between hip_bfloat16 and __hip_bfloat16?

@pcmoritz hip_bf16.h is meant to simplify the porting of an application that include cuda_bf16.h. The latter does not provide operators and nor does the former. These headers instead provide functions...

Difference between hip_bfloat16 and __hip_bfloat16?

@pcmoritz I don't know where that operator is coming from. https://docs.nvidia.com/cuda/cuda-math-api/group__CUDA__MATH____BFLOAT16__FUNCTIONS.html certainly doesn't mention any operators.

Difference between hip_bfloat16 and __hip_bfloat16?

Sorry, I now see them in section 1.3.2.

IEEE 754 rounding modes

To start with, a lot of work on LLVM whose IR cannot currently describe non-default-rounded operations. Then additional work on the AMDGPU target to support the updated IR.

IEEE 754 rounding modes

We could enable some now, but unfortunately not all. Which ones are the most important?

Missing Spec Descriptions for CL_UNORM_SHORT_565 and CL_UNORM_SHORT_555

There's been some question about exactly which bits in the 16 bit word correspond to R, G, and B as well, so we may want to cover that too. Similarly...

[Feature]: support image instructions for MI300X?

MI-300 has no hardware support for image instructions. Whatever you wanted to do with them could likely be accomplished faster by your code than a slow emulation layer.

Relax the required accuracy of tan(half)?

Independent verification of the algorithm: https://www.wolframalpha.com/input?i=tan%2819360%29+%2B+1%2Ftan%2819360+-+12325*Pi%2F2%29