Flish Wang issues

Results 6 issues of


                                            Flish Wang

Is the bias term in the predictor/projector in the Linear layer necessary?

Hi, thanks for your work. It's wonderful. But I noticed that the bias arg in all nn.Linear layer in models/simsiam.py is set as its default value True. To my knowledge,...

Questions about the tutorial fused-attention

Hi, I'm new to triton. I noticed that we use tl.make_block_ptr in the forward kernel: ``` Q_block_ptr = tl.make_block_ptr( base=Q + qvk_offset, shape=(N_CTX, BLOCK_DMODEL), strides=(stride_qm, stride_qk), offsets=(start_m * BLOCK_M, 0),...

dataloader crashes after several epochs if the trained model contains triton-based operators

### 🐛 Describe the bug Compete codes uploaded to: [minifer.py](https://gist.github.com/flishwang/9e561371966ab12c7f3709f7315aea14) Key codes (Line 990 to Line 1068): ``` use_block_attn = sys.argv[-1] == '1' print(f'use_block_attn = {use_block_attn} {sys.argv[-1]}') model = ViTA(...

config setting "NORM_BEFORE_MLP" takes no effect

in models/build.py:41, the keyword passed to partial(SwinTransformer,...) is norm_befor_mlp, while the keyword in SwinTransformer (models/swin_transformer.py:497) is norm_before_mlp. The formmer missed a letter E compared with the latter. Therefore, the 'bn'...

nn.Module.to(memory_format= channels_last format) failed if containing 5D parameters

### 🐛 Describe the bug The following code may failed: ``` import torch from torch import nn class A(nn.Module): def __init__(self): super().__init__() self.p = nn.Parameter(torch.zeros((1, 8, 1, 1, 256))) a=A().to(memory_format=torch.channels_last)...

cross-compile triton kernels with tools/compile.py

I want to compile a triton kernel and use it on different machines. Is there any argument options in tools/compile.py or env variables (just like the TORCH_CUDA_ARCH_LIST variable when building...