byte-6174

Results 19 comments of byte-6174

Following up with the little pseudo-code above, checked in a small experiment that does weights/acts quant on the fly. needs more experiments as blindly turning all matmults run on int8...

yes, that change was needed for llama2 7B model. thanks @kroggen ! ``` ./runq data.bin -n 16 -i "why is sky blue?" why is sky blue? Here's a theory everyone's...

Yes. Llama2.cpp has groups 64 etc. Why risky ?

Humm. Trying to understand this, So the groups of 64 avoids this how ? You mean outlier in the magnitude sense I'm presuming ?

Btw as a side note : there is experimental evidence,in llama.cpp and also places like llm.int8, of needing mixed precision to tackle outliers. Thought we might want to / have...

@jrudolph for activation quantization, do you use data statistics? if so what data is used? if no data is used, how are the acts calculated in your scala implementation? sorry,...

I'm trying to understand how it is done from the links you provided above, at the base level. I understand that there are many complex mix-and-match strategies like keeping certain...

@atamurad: > LLama2-7B-chat, Q8_A/Q8_B => 6.7GB model size, output is OK but slow as expected. slower than float32 model?