b-sumner

Results 144 comments of b-sumner

@pcmoritz hip_bfloat16.h appeared before cuda_bf16.h and contains what was needed at the time. hip_bf16.h is meant to match cuda_bf16.h, and is probably the header of choice at this point. hip_bfloat16.h...

Thanks, I've forwarded your comments.

@pcmoritz hip_bf16.h is meant to simplify the porting of an application that include cuda_bf16.h. The latter does not provide operators and nor does the former. These headers instead provide functions...

@pcmoritz I don't know where that operator is coming from. https://docs.nvidia.com/cuda/cuda-math-api/group__CUDA__MATH____BFLOAT16__FUNCTIONS.html certainly doesn't mention any operators.

Sorry, I now see them in section 1.3.2.

To start with, a lot of work on LLVM whose IR cannot currently describe non-default-rounded operations. Then additional work on the AMDGPU target to support the updated IR.

We could enable some now, but unfortunately not all. Which ones are the most important?

There's been some question about exactly which bits in the 16 bit word correspond to R, G, and B as well, so we may want to cover that too. Similarly...

MI-300 has no hardware support for image instructions. Whatever you wanted to do with them could likely be accomplished faster by your code than a slow emulation layer.

Independent verification of the algorithm: https://www.wolframalpha.com/input?i=tan%2819360%29+%2B+1%2Ftan%2819360+-+12325*Pi%2F2%29