Huang Haiduo
Huang Haiduo
> Hi @haiduo , you could check these papers: https://arxiv.org/pdf/1502.01852.pdf, https://arxiv.org/pdf/1606.05340.pdf, https://arxiv.org/pdf/1611.01232.pdf, all of which analyze training dynamics for centered weight. I am not sure how to analyze weights with...
> Hi @haiduo, thanks again for your interest. For b=4, it maps [-1, 1] to [0, 1], to {0, 1, ..., 15}, to {0.5, 1.5, ..., 15.5}, to {1/32, 3/32,...
> Hi @haiduo, thanks again for your interest. For b=4, it maps [-1, 1] to [0, 1], to {0, 1, ..., 15}, to {0.5, 1.5, ..., 15.5}, to {1/32, 3/32,...
I have solved it! Look at follows: The bug stem form https://github.com/Lightning-AI/lit-llama/blob/da71adea0970d6d950fb966d365cfb428aef8298/lit_llama/model.py#L130 I managed to change it : from transformers.utils import is_torch_bf16_gpu_available dtype=torch.bfloat16 if is_torch_bf16_gpu_available() else torch.float16,
> Have you figured it out? To me, the line 428 is somehow another form of equation(3) in the paper. `out[..., :-1]` is equal to QK^TV, and `out[..., -1:]` is...