Andrej

Results 373 comments of Andrej

Another reason I am usually suspicious of asymetric usually btw is that it doesn't guarantee that zero is exactly represented. In symmetric 0 = 0 for sure. In asymetric it...

@kroggen you're right, it just requires a bit extra logic, but this would be the preferred way if we ended up using an asymetric encoding.

Quantization here is per layer instead of groups. That feels risky? I'd expect llama.cpp does groups?

One outlier nukes the whole tensor. I'm starting a branch for int8 quantization now. I'll do groups.

If there is a bad outlier somewhere, only e.g. up to 63 elements get "messed up" with high error, not the entire tensor. So breaking things up into groups makes...

I am really not sure about this branch. For me all of the _t stuff is visual clutter, it makes it harder for me to parse the code. Maybe we...

This is really cool! Wow. Do you have any stats on the 110M, or even better 7B model? What are your thoughts on how we maintain all the copy paste...

(Btw I really want to get around to the CUDA versions but still a lot of the "basics" of the repo are not where I want them to be. I...

This PR would break the repo, the move would need careful treatment. I'm not 100% sure it's a good idea yet, let me think

For others who might stumble here in the future, the current implementation here atm is definitely wrong for some models/tokenizers. E.g. GPT series would not work. Sentencepiece also would not...