BitNet-Transformers BitNet_Llama_model_test_huggingface

I was testing in Colab and when I ran "model.model.layers[0].mlp.gate_proj.weight". I recieved very different results from yours. You got: Parameter containing: tensor([[ 0.0032, -0.0339, 0.0150, ..., 0.0041, -0.0048, 0.0061], [-0.0105, -0.0049, -0.0586, ..., -0.0092, 0.0188, -0.0084], [-0.0383, -0.0109, 0.0031, ..., -0.0410, 0.0211, 0.0223], ..., [ 0.0131, -0.0259, 0.0034, ..., 0.0233, -0.0281, -0.0131], [ 0.0062, 0.0198, 0.0085, ..., 0.0129, -0.0205, 0.0050], [ 0.0292, 0.0152, -0.0175, ..., 0.0256, 0.0276, 0.0082]], device='cuda:0', dtype=torch.bfloat16, requires_grad=True)

I got: tensor([[ 0.0007, 0.0007, 0.0007, ..., 0.0007, -0.0007, 0.0007], [ 0.0007, 0.0007, 0.0007, ..., 0.0007, 0.0007, -0.0007], [ 0.0007, 0.0007, -0.0007, ..., -0.0007, -0.0007, 0.0007], ..., [ 0.0007, -0.0007, 0.0007, ..., -0.0007, -0.0007, 0.0007], [ 0.0007, 0.0007, 0.0007, ..., -0.0007, -0.0007, -0.0007], [ 0.0007, -0.0007, 0.0007, ..., 0.0007, 0.0007, 0.0007]], device='cuda:0', dtype=torch.bfloat16)

Oct 25 '23 14:10 DewEfresh

Thanks for notice!

Could you provide the colab code you tried to run? it would be helpful to check the issue😄

Oct 25 '23 14:10 Beomi

https://colab.research.google.com/drive/1nvzhy_PCBZ_r6dlvQv3GfweJsGlZHrNJ?usp=sharing

Oct 25 '23 16:10 DewEfresh

BitNet_Llama_model_test_huggingface_GPU.ipynb