djl-serving issues

Results 54 djl-serving issues

Sort by recently updated

Add better token handling under hazardous condition

## Description ## Reach better handling at problematic points

lanking520

add TGI compat feature for rollingbatch

## Description ## Add TGI compatibility support

lanking520

Refactor huggingface handler

## Description ## - Extracting Rolling batch handling away from huggingface.py, but the UX wont change, the handler would still be `djl_python.huggingface`

sindhuvahinis

[CI] Create workflow for SM Neo integration tests

## Description ## This PR is introducing a new Actions workflow for running integration tests which will verify the functionality of LMI DLCs while being used in a SageMaker Neo...

ethnzhng

remove usage of SERVING_LOAD_MODELS in examples/docs/tests

## Description ## Brief description of what this PR is about - If this change is a backward incompatible change, why must this change be made? - Interesting edge cases...

siddvenk

Error running google/gemma-7b-it

## Description Running djl-inference:0.27.0-neuronx-sdk2.18.1, with Huggingface model google/gemma-7b-it fails. ### Error Message WARN PyProcess W-93-model-stderr: --- Logging error --- WARN PyProcess W-93-model-stderr: Traceback (most recent call last): WARN PyProcess W-93-model-stderr:...

yaronr

bug

DJL-TensorRT-LLM Bug: TypeError: Got unsupported ScalarType BFloat16

## Description (A clear and concise description of what the bug is.) I'm am building the DJL-Serving TensorRT-LLM LMI inference container from scratch, and deploying on Sagemaker Endpoints for Zephyr-7B...

rileyhun

bug

Better support prometheus metrics and/or allow custom prometheus metrics

## Description Support for prometheus metrics added in 0.27 is really helpful for my use case, but it can still be improved in my opinion. A couple of issues in...

glennq

enhancement

DJL-TRTLLM: Error while detokenizing output response of teknium/OpenHermes-2.5-Mistral-7B on Sagemaker

## Description I followed the recipe given [here](https://docs.djl.ai/docs/serving/serving/docs/lmi/tutorials/trtllm_manual_convert_tutorial.html) to manually convert teknium/OpenHermes-2.5-Mistral-7B to tensorrt on sagemaker's `ml.g5.4xlarge` and deploy the compiled model saved on s3 on sagemaker endpoint using `ml.g5.2xlarge`...

omarelshehy

bug

question to error model conversion process failed

## Description djl-serving version: djl-inference:0.26.0-tensorrtllm0.7.1 models: - meta-llama/Llama-2-7b-chat see: https://huggingface.co/meta-llama/Llama-2-7b-chat (used this report) - meta-llama/Llama-2-7b-chat-hf see: https://huggingface.co/meta-llama/Llama-2-7b-chat-hf error: java.util.concurrent.CompletionException: ai.djl.engine.EngineException: Model conversion process failed! aws instance: g5.12xlarge (= 4 nvidia...

geraldstanje

bug

djl-serving
djl-serving copied to clipboard

Metadata

Add better token handling under hazardous condition

add TGI compat feature for rollingbatch

Refactor huggingface handler

[CI] Create workflow for SM Neo integration tests

remove usage of SERVING_LOAD_MODELS in examples/docs/tests

Error running google/gemma-7b-it

DJL-TensorRT-LLM Bug: TypeError: Got unsupported ScalarType BFloat16

Better support prometheus metrics and/or allow custom prometheus metrics

DJL-TRTLLM: Error while detokenizing output response of teknium/OpenHermes-2.5-Mistral-7B on Sagemaker

question to error model conversion process failed

← Metadata

Owner

Metadata

djl-serving djl-serving copied to clipboard

Metadata

← Metadata

Owner

Metadata

djl-serving
djl-serving copied to clipboard