Huang Haiduo comments

Results 26 comments of


                                            Huang Haiduo

Bad performance on CIFAR using on low bit width

> Hi @haiduo , you could check these papers: https://arxiv.org/pdf/1502.01852.pdf, https://arxiv.org/pdf/1606.05340.pdf, https://arxiv.org/pdf/1611.01232.pdf, all of which analyze training dynamics for centered weight. I am not sure how to analyze weights with...

Bad performance on CIFAR using on low bit width

> Hi @haiduo, thanks again for your interest. For b=4, it maps [-1, 1] to [0, 1], to {0, 1, ..., 15}, to {0.5, 1.5, ..., 15.5}, to {1/32, 3/32,...

Bad performance on CIFAR using on low bit width

> Hi @haiduo, thanks again for your interest. For b=4, it maps [-1, 1] to [0, 1], to {0, 1, ..., 15}, to {0.5, 1.5, ..., 15.5}, to {1/32, 3/32,...

RuntimeError: Expected x1.dtype() == cos.dtype() to be true, but got false. (Could this error message be improved? If so, please report an enhancement request to PyTorch.)

I have solved it! Look at follows: The bug stem form https://github.com/Lightning-AI/lit-llama/blob/da71adea0970d6d950fb966d365cfb428aef8298/lit_llama/model.py#L130 I managed to change it : from transformers.utils import is_torch_bf16_gpu_available dtype=torch.bfloat16 if is_torch_bf16_gpu_available() else torch.float16,

Why padding in attention?

> Have you figured it out? To me, the line 428 is somehow another form of equation(3) in the paper. `out[..., :-1]` is equal to QK^TV, and `out[..., -1:]` is...

The inference speed is measured with pytorch or tensorRT model??

mark +1