DJL 0.30 Sagemaker Endpoint Deployment using vllm of quantized model parameter option.quantization is not working

Open adi7820 opened this issue 1 year ago • 0 comments

Hello,

I'm trying to deploy LLAMA 3.2 vision 4 bit bitsandbytes quantized model as a sagemaker endpoint, but I've encountered one error regarding quantization.

{1099C9D7-43A0-452E-AF12-2ADCF50A5A60}

As per the above image it says it is receiving quantization as 'None' even though I've set it's properties during configuration while creating serving.properties of sagemaker endpoint.

%%writefile serving.properties engine=Python option.model_id=unsloth/Llama-3.2-11B-Vision-Instruct-bnb-4bit option.rolling_batch=vllm option.dtype=bf16 option.max_model_len=8192 option.max_num_seqs=1 option.enforce_eager=True option.gpu_memory_utilization=0.9 option.quantization=bitsandbytes option.load_format=bitsandbytes

When I re-iterated the error in vllm github repo then I found that the actual reason of the error that parameter quantization is not receiving its value.

{E56BD39D-7EB4-4D97-B530-61A61BCB6380}

Can you guys help me with the possible solution or need to wait for another version?

Nov 28 '24 15:11 adi7820