BitNet-Transformers
BitNet-Transformers copied to clipboard
weird behivior/implementation error?
i took the code for BitLinearOptimized and added a small thing so I can run it standalone
super(BitLinearOptimized, self).__init__(in_features, out_features, bias,dtype=torch.bfloat16)
#just added the right dtype
runing the folowing
w=BitLinearOptimized(1,1)
x=torch.ones(1,dtype=torch.bfloat16)
y=w(x)
print(list(w.parameters()))
gives
[Parameter containing: tensor([0.0703], dtype=torch.bfloat16, requires_grad=True)]
meaning that there is only the weight. is this an intended behivior? because I saw in ur training u use model.parameters() so it seems like that would be an issue.