unsloth
unsloth copied to clipboard
model merge for vllm inference
I have a question. When training with unsloth, it is done in 4-bit, but when merging the model for vLLM inference, the data type is converted to 16-bit. Is this not a problem? Or is scaling performed using quantization constants?
@Sangh0 There will be basically no performance degrade. Yes, I think scaling is performed? @danielhanchen
Yes it upcasts to 16bit for vllm so not an issue
Thank you for your answer.