neoweasley comments

Repositories
Issues
Comments

Results 1 comments of


                                            neoweasley

[Questions]: How to implement NF4/NF2 matmul kernel function?

The paper "QLORA: Efficient Finetuning of Quantized LLMs" says "In practice, this means whenever a QLORA weight tensor is used, we dequantize the tensor to BFloat16, and then perform a...