Varuna Jayasiri comments

Results 56 comments of


                                            Varuna Jayasiri

size mismatch for weights and bias

Looks like the number of tokens is different from the number of token when it was training. Did you change the dataset or run BPE again?

size mismatch for weights and bias

This looks sounds like a bug. The dimensions of the embedding weights are number of tokens and number of embedding features (d_model)

size mismatch for weights and bias

I will give it a try and see if I can reproduce. Are you running the latest master? Did you make changes? Also is the dataset the same?

Bug in SA for DDPM UNet?

Thanks, you are right! That's a typo and a big bug!

Bug in SA for DDPM UNet?

This is strange. I guess the wrong softmax also provides a similar non-linearity to the correct softmax and gradient descent finds a way to use it. But I don't understand...

bug in switch transformer when using torch.bfloat16

Should we do that or ``` final_output[indexes_list[i], :] = expert_output[i].to(x.dtype) ``` Because it seems like you changed expert to `bfloat16`, while the transformer general processing was in `float32`, and you...

Stride setting in ResNet implementation

The feature map size doesn't change. Can you please point to the comment that mentions it does? The [blocks are in a `nn.Sequential`](https://nn.labml.ai/resnet/index.html#section-63)

Stride setting in ResNet implementation

Yeah that is correct. The feature map size stays the same. I meant the comment that you said didn't match with the code when I asked for the comment.

致谢+建议

我们将尝试添加自动翻译

致谢+建议

我们将评论机器翻译成中文 https://nn.labml.ai/zh/