vector-quantize-pytorch icon indicating copy to clipboard operation
vector-quantize-pytorch copied to clipboard

Support half precision for VQ and FSQ

Open JunityZhan opened this issue 1 year ago • 6 comments
trafficstars

Hi, I notice it is not support other preicsion. I made this tiny change and I tried a simple sample and it works on fp16 and bf16. I noticed that there is a x = x.float(), I just comment it. I don't know if it is necessary. In my experiment, it just works fine, we can choose any precision into it.

JunityZhan avatar Jun 30 '24 15:06 JunityZhan

@JunityZhan are you sure? researchers are telling me it doesn't perform well at all with low precision.

lucidrains avatar Jun 30 '24 17:06 lucidrains

@JunityZhan are you sure? researchers are telling me it doesn't perform well at all with low precision.

In training, I think it is better to train it with fp32. However, in inference, I change it to fp16, and the precision of fp16 does not affect anything on quantized result. For e.g. you have 0.53244378278482 and 0.5324437 in fp32 and 16(just an example), you use FSQ, they will round to exactly the same number.

JunityZhan avatar Jul 03 '24 09:07 JunityZhan

btw, it fix #145

JunityZhan avatar Jul 03 '24 09:07 JunityZhan

@JunityZhan ok, if what you are saying is true, then i can think about it

but your fix will break for your own condition In training, I think it is better to train it with fp32. correct?

lucidrains avatar Jul 03 '24 13:07 lucidrains

@lucidrains I am not sure I understand what you said about my fix will break for my own condition. If I dont make that change, I cannot inference it with fp16. The change is just to make sure we can choose whatever precision we like. No matter you want to train it with fp16, fp32, inference it with fp16, fp32, they are allowed. If I don't make that change, I cant inference it with fp16. BTW, even in my test it is better to train it with fp32, it is still posible some tasks we can use fp16, bf16 or other precision, not to mention the need to inference it with fp16.

JunityZhan avatar Jul 03 '24 15:07 JunityZhan

@JunityZhan right right. the difference is whether i always enforce f32 during training so that people (non researchers) have a greater chance of success using the library, or allow for flexibility. let me think about it

lucidrains avatar Jul 03 '24 15:07 lucidrains

@JunityZhan do you want to see if setting this False works for you?

lucidrains avatar Jul 06 '24 13:07 lucidrains

@JunityZhan do you want to see if setting this False works for you?

I think you only make modification on looks up quantization, but not vq and fsq

JunityZhan avatar Jul 06 '24 13:07 JunityZhan

@JunityZhan ahh yes, you wanted to do FSQ, let me apply that as well

lucidrains avatar Jul 06 '24 13:07 lucidrains

@JunityZhan ok, try it now for FSQ and if it works out, i'll do the same strat for vq

lucidrains avatar Jul 06 '24 13:07 lucidrains

@JunityZhan ok, Marco reports it is working, closing

lucidrains avatar Jul 11 '24 13:07 lucidrains