djl-serving icon indicating copy to clipboard operation
djl-serving copied to clipboard

A universal scalable machine learning model deployment solution

Results 54 djl-serving issues
Sort by recently updated
recently updated
newest added

## Description ## Reach better handling at problematic points

## Description ## Add TGI compatibility support

## Description ## - Extracting Rolling batch handling away from huggingface.py, but the UX wont change, the handler would still be `djl_python.huggingface`

## Description ## This PR is introducing a new Actions workflow for running integration tests which will verify the functionality of LMI DLCs while being used in a SageMaker Neo...

## Description ## Brief description of what this PR is about - If this change is a backward incompatible change, why must this change be made? - Interesting edge cases...

## Description Running djl-inference:0.27.0-neuronx-sdk2.18.1, with Huggingface model google/gemma-7b-it fails. ### Error Message WARN PyProcess W-93-model-stderr: --- Logging error --- WARN PyProcess W-93-model-stderr: Traceback (most recent call last): WARN PyProcess W-93-model-stderr:...

bug

## Description (A clear and concise description of what the bug is.) I'm am building the DJL-Serving TensorRT-LLM LMI inference container from scratch, and deploying on Sagemaker Endpoints for Zephyr-7B...

bug

## Description Support for prometheus metrics added in 0.27 is really helpful for my use case, but it can still be improved in my opinion. A couple of issues in...

enhancement

## Description I followed the recipe given [here](https://docs.djl.ai/docs/serving/serving/docs/lmi/tutorials/trtllm_manual_convert_tutorial.html) to manually convert teknium/OpenHermes-2.5-Mistral-7B to tensorrt on sagemaker's `ml.g5.4xlarge` and deploy the compiled model saved on s3 on sagemaker endpoint using `ml.g5.2xlarge`...

bug

## Description djl-serving version: djl-inference:0.26.0-tensorrtllm0.7.1 models: - meta-llama/Llama-2-7b-chat see: https://huggingface.co/meta-llama/Llama-2-7b-chat (used this report) - meta-llama/Llama-2-7b-chat-hf see: https://huggingface.co/meta-llama/Llama-2-7b-chat-hf error: java.util.concurrent.CompletionException: ai.djl.engine.EngineException: Model conversion process failed! aws instance: g5.12xlarge (= 4 nvidia...

bug