bitsandbytes icon indicating copy to clipboard operation
bitsandbytes copied to clipboard

Is it possible to enable fused op F.gemv_4bit in F.gemv_4bit backward?

Open jiqing-feng opened this issue 1 year ago • 1 comments

Feature request

Enable fused op F.gemv_4bit in F.gemv_4bit backward

Motivation

The forward and backward in 4bit have same calculations, so I was wondering if we could enable fused op in backward.

Your contribution

Hi @Titus-von-Koeller . I see that when the inputs required gradient, we will disable fused op here. I am not sure if we could also enable the F.gemv_4bit in backward because both forward and backward do the same thing (dequant 4bit param and take fp matmul). WDYT?

jiqing-feng avatar May 31 '24 02:05 jiqing-feng

This kernel is meant for 4-bit vector-matrix multiplication, which is a common use-case for token-by-token inference/generation; however, in the backward pass, a token-by-token backward is unusual.

More commonly, a backward pass would be used to backpropagate a sequence or a batch of sequences -- which is a matrix-matrix multiplication for which this kernel will be slower than the other standard 4-bit kernel.

TimDettmers avatar Jun 03 '24 17:06 TimDettmers