bitsandbytes
bitsandbytes copied to clipboard
Is it possible to enable fused op F.gemv_4bit in F.gemv_4bit backward?
Feature request
Enable fused op F.gemv_4bit in F.gemv_4bit backward
Motivation
The forward and backward in 4bit have same calculations, so I was wondering if we could enable fused op in backward.
Your contribution
Hi @Titus-von-Koeller . I see that when the inputs required gradient, we will disable fused op here. I am not sure if we could also enable the F.gemv_4bit in backward because both forward and backward do the same thing (dequant 4bit param and take fp matmul). WDYT?
This kernel is meant for 4-bit vector-matrix multiplication, which is a common use-case for token-by-token inference/generation; however, in the backward pass, a token-by-token backward is unusual.
More commonly, a backward pass would be used to backpropagate a sequence or a batch of sequences -- which is a matrix-matrix multiplication for which this kernel will be slower than the other standard 4-bit kernel.