djl icon indicating copy to clipboard operation
djl copied to clipboard

DJL 0.30 Sagemaker Endpoint Deployment using vllm of quantized model parameter option.quantization is not working

Open adi7820 opened this issue 1 year ago • 0 comments

Hello,

I'm trying to deploy LLAMA 3.2 vision 4 bit bitsandbytes quantized model as a sagemaker endpoint, but I've encountered one error regarding quantization.

{1099C9D7-43A0-452E-AF12-2ADCF50A5A60}

As per the above image it says it is receiving quantization as 'None' even though I've set it's properties during configuration while creating serving.properties of sagemaker endpoint.

%%writefile serving.properties engine=Python option.model_id=unsloth/Llama-3.2-11B-Vision-Instruct-bnb-4bit option.rolling_batch=vllm option.dtype=bf16 option.max_model_len=8192 option.max_num_seqs=1 option.enforce_eager=True option.gpu_memory_utilization=0.9 option.quantization=bitsandbytes option.load_format=bitsandbytes

When I re-iterated the error in vllm github repo then I found that the actual reason of the error that parameter quantization is not receiving its value.

{E56BD39D-7EB4-4D97-B530-61A61BCB6380}

Can you guys help me with the possible solution or need to wait for another version?

adi7820 avatar Nov 28 '24 15:11 adi7820