Huang Haiduo

[email protected]

Xi'an Jiaotong University XI'an student of Xi'an Jiaotong University, @Xi'an Jiaotong University

Results 40 comments of


                                            Huang Haiduo

Bad performance on CIFAR using on low bit width

> Hi @haiduo , you could check these papers: https://arxiv.org/pdf/1502.01852.pdf, https://arxiv.org/pdf/1606.05340.pdf, https://arxiv.org/pdf/1611.01232.pdf, all of which analyze training dynamics for centered weight. I am not sure how to analyze weights with...

Bad performance on CIFAR using on low bit width

> Hi @haiduo, thanks again for your interest. For b=4, it maps [-1, 1] to [0, 1], to {0, 1, ..., 15}, to {0.5, 1.5, ..., 15.5}, to {1/32, 3/32,...

Bad performance on CIFAR using on low bit width

> Hi @haiduo, thanks again for your interest. For b=4, it maps [-1, 1] to [0, 1], to {0, 1, ..., 15}, to {0.5, 1.5, ..., 15.5}, to {1/32, 3/32,...

RuntimeError: Expected x1.dtype() == cos.dtype() to be true, but got false. (Could this error message be improved? If so, please report an enhancement request to PyTorch.)

I have solved it! Look at follows: The bug stem form https://github.com/Lightning-AI/lit-llama/blob/da71adea0970d6d950fb966d365cfb428aef8298/lit_llama/model.py#L130 I managed to change it : from transformers.utils import is_torch_bf16_gpu_available dtype=torch.bfloat16 if is_torch_bf16_gpu_available() else torch.float16,

Why padding in attention?

> Have you figured it out? To me, the line 428 is somehow another form of equation(3) in the paper. `out[..., :-1]` is equal to QK^TV, and `out[..., -1:]` is...

The inference speed is measured with pytorch or tensorRT model??

mark +1

Why does EAGLE remove the input_layernorm of llama?

> The base model has a layer normalization (layernorm) layer before the LM head. Since the feature sequence has already been normalized, we do not use layer normalization. It is...

What's the relationship between total token and depth

Let me try to answer this. It should be that there is no need to expand the total-tokens, but you need to ensure that the total tokens of the modified...

output logits not match. question about decoding when draft model and target model is the same.

After reading your comment, I find this phenomenon very interesting, so I try it just now and find that the output logits of the draft model and target model are...

output logits not match. question about decoding when draft model and target model is the same.

> > Thank you both for your careful observation; these details are very helpful. > > I suggest whether it is possible to change the comparison of two float values...

‹
1
2
3
4
›