triton icon indicating copy to clipboard operation
triton copied to clipboard

Why change the order of make_block_ptr when V.dtype.element_ty == tl.float8e5?

Open yunjiangster opened this issue 1 year ago • 1 comments

In the fused attention tutorial, there is this line

v_order: tl.constexpr = (0, 1) if V.dtype.element_ty == tl.float8e5 else (1, 0)

I can't quite figure out why the order depends on the element type. The tutorial didn't give an example input of type float8e5. According to this blog (https://mengyibai.com/p/order-in-triton-make-block-ptr/), the order is only to help the compiler be more efficient, and is equal to np.argsort(strides) The blog seems to have a typo np.argsort(-strides).

yunjiangster avatar Jun 02 '24 02:06 yunjiangster

Hi, I still can't get it. Why changing order can be more efficient?

namespace-Pt avatar Sep 25 '24 11:09 namespace-Pt

The v matrix is used as the b argument to a wgmma instruction. wgmma allows fp16 inputs to be in either row-major or column major format, but for FP8 types the a matrix must be row-major and b must be column major.

peterbell10 avatar Nov 18 '24 21:11 peterbell10