Yunjie issues

Results 4 issues of


                                            Yunjie

Compilation error when dtype=float16, but no error when dtype=float32

Repro(although may not be a minimal one): [conv_relu_conv_relu_float16.py](https://gist.github.com/pyjhzwh/d9666e36ec7bd7963a0252ddb9351fbc#file-conv_relu_conv_relu_float16-py), [conv_relu_conv_relu_float32.py](https://gist.github.com/pyjhzwh/d9666e36ec7bd7963a0252ddb9351fbc#file-conv_relu_conv_relu_float32-py) `call()` does the forward computation of ``` torch.nn.Sequential( nn.Conv2d(3, 64, kernel_size=11, stride=4, padding=2), nn.ReLU(inplace=True), nn.Conv2d(64, 192, kernel_size=5, padding=2), nn.ReLU(inplace=True), ) ```...

bug

Different configs will give different outputs for the same kernel

repro: https://gist.github.com/pyjhzwh/a19de7882aff600ee4472398b3017758 kernel 0 basically do matmul, then multiply the results by 1.0, the store it back to output buffer. buf1 is the output buffer of kernel0 given the config...

bug

map:at error when loading two tensors with same ptrs and masks

repro: https://gist.github.com/pyjhzwh/2ba871a53c2eac6575948467317bafa1 ``` matrix_x00 = tl.load(x00_ptrs, mask=mask_x00, other=0.) matrix_x01 = tl.load(x01_ptrs, mask=mask_x01, other=0.) ``` where x00_ptrs and x01_ptrs are the same, mask_x00 and mask_x01 are the same. But it will...

bug

map::at error when adding [XBLOCK, RBLOCK] size of data with [XBLOCK, 1] size of data

`map::at error` Repro: ``` import torch import triton from torch import empty_strided, as_strided import triton.language as tl @triton.jit def kernel0(in_ptr0, out_ptr0, ks0, xnumel, rnumel, XBLOCK : tl.constexpr, RBLOCK : tl.constexpr):...

bug