Hongtao Yu

Results 53 comments of Hongtao Yu

> BTW, @htyu can you help me understand when it's safe to apply `tl.multiple_of` or `tl.max_contiguous` here? > > ``` > ram = tl.max_contiguous(tl.multiple_of(rm % M, BLOCK_M), BLOCK_M) > ```...

> Here is some perf test: > > 1. Changing stride_ak from 50304 to 50257 (i.e. cancel padding), perf is 27.69ms. In this case I think even if we can...

> rm % 50304 is not always in bound. The last row may not have extra items after it. The source tensor X does not have an out-of-bound issue but...