djl
djl copied to clipboard
Unable to load model with LoRA adapter using DJL image with SageMaker fast model loader
Description
When attempting to load Meta-Llama-3.1-8B-Instruct-AWQ-INT4 with a LoRA adapter using the DJL image with SageMaker fast model loader, the model fails to load properly.
Expected Behavior
The model should load successfully with the LoRA adapter applied, allowing for inference with the adapted model weights.
Error Message
[INFO ] PyProcess - W-140-model-stdout: [1,0]<stdout>:RuntimeError: Number of chunk files and slices do not match.
2025-03-12T05:35:42.978Z
Caused by: ai.djl.engine.EngineException: Failed to initialize model: invoke handler failure
2025-03-12T05:36:05.015Z [INFO ] PyProcess - W-140-model-stdout: [1,0]<stdout>:INFO 03-12 05:36:05 loader.py:27] SageMakerFastModelLoader loading shard for tp_rank: 0, pp_rank: 0
[INFO ] PyProcess - W-140-model-stdout: [1,0]<stdout>:INFO 03-12 05:36:05 loader.py:27] SageMakerFastModelLoader loading shard for tp_rank: 0, pp_rank: 0
2025-03-12T05:36:05.194Z [INFO ] PyProcess - W-140-model-stdout: [1,0]<stdout>:**ERROR::Failed invoke service.invoke_handler()*| [INFO ] PyProcess - W-140-model-stdout: [1,0]<stdout>:**ERROR::Failed invoke service.invoke_handler()
How to Reproduce?
- Shard the model using SageMaker Studio by running an optimization job, and place the adapter in the
adaptersfolder where the model is located. - Create an endpoint with the DJL image and the model artifacts.
- Add these configurations in
serving.properties: - Check the error logs in cloudwatch.
option.load_format=sagemaker_fast_model_loader
option.tensor_parallel_degree=1
option.max_model_len=16000
serving.max_model_len=16000
option.enable_lora=true
option.max_loras=10
What Have You Tried to Solve It?
- Followed the folder structure and modified
serving.propertiesas mentioned in the official DJL documentation: [Multi-LoRA Adapter Inference Guide](https://docs.djl.ai/master/docs/demos/aws/sagemaker/large-model-inference/sample-llm/multi_lora_adapter_inference.html) - Despite these steps, the model still fails to load successfully.
Additional Information
- DJL Image Version: 763104351884.dkr.ecr.us-east-1.amazonaws.com/djl-inference:0.31.0-lmi13.0.0-cu124