unsloth model merge for vllm inference

model merge for vllm inference

Open Sangh0 opened this issue 10 months ago • 3 comments

I have a question. When training with unsloth, it is done in 4-bit, but when merging the model for vLLM inference, the data type is converted to 16-bit. Is this not a problem? Or is scaling performed using quantization constants?

Apr 17 '24 06:04 Sangh0

@Sangh0 There will be basically no performance degrade. Yes, I think scaling is performed? @danielhanchen

Apr 17 '24 06:04 mahiatlinux

Yes it upcasts to 16bit for vllm so not an issue

Apr 17 '24 08:04 shimmyshimmer

Thank you for your answer.

Apr 17 '24 08:04 Sangh0

unsloth unsloth copied to clipboard

model merge for vllm inference

unsloth
unsloth copied to clipboard