Tim Dettmers

Results 106 comments of Tim Dettmers

This kernel is meant for 4-bit vector-matrix multiplication, which is a common use-case for token-by-token inference/generation; however, in the backward pass, a token-by-token backward is unusual. More commonly, a backward...

Sorry we really messed this up. This was so very close from being integrated into bitsandbytes and we failed on the bitsandbytes side to go the last mile. This was...

We are closing this for now but will reopen if we start working on this. Again, thank you for bringing this so far and sorry for messing this up.

Very good catch. I think this would be a good contribution. In general, the latency overhead over dequantization operation is currently the biggest slowdown for these kernels. I think if...

Instead of replying, I quickly tried to implement it, but I failed. Despite this is might be a good starting point to implement this. You can find my changed on...

You can do this by using the 4-bit quantization functions on the weights of the model. One way to do this manually is to do pseudocode: ``` for module in...