DJL 0.30 Sagemaker Endpoint Deployment using vllm of quantized model parameter option.quantization is not working
Hello,
I'm trying to deploy LLAMA 3.2 vision 4 bit bitsandbytes quantized model as a sagemaker endpoint, but I've encountered one error regarding quantization.
As per the above image it says it is receiving quantization as 'None' even though I've set it's properties during configuration while creating serving.properties of sagemaker endpoint.
%%writefile serving.properties engine=Python option.model_id=unsloth/Llama-3.2-11B-Vision-Instruct-bnb-4bit option.rolling_batch=vllm option.dtype=bf16 option.max_model_len=8192 option.max_num_seqs=1 option.enforce_eager=True option.gpu_memory_utilization=0.9 option.quantization=bitsandbytes option.load_format=bitsandbytes
When I re-iterated the error in vllm github repo then I found that the actual reason of the error that parameter quantization is not receiving its value.
Can you guys help me with the possible solution or need to wait for another version?