qlora
qlora copied to clipboard
Inference Time 2X Slow
Hi all,
When I perform finetuned model inference on 2 GPUs and load in 4bit, the speed is 2X slower compared with the original model after 4 bit quantization.
The model I used is MOSS and the reason use 2 GPUs for inference is OOM issue. I wonder why this happens. Really hope u could help me out
@lukaswangbk did you find a solution?