djl-serving issues

Results 54 djl-serving issues

Sort by recently updated

Deep learning engine not found: MPI Error

Description: java.util.concurrent.CompletionException: java.lang.IllegalArgumentException: Deep learning engine not found: MPI Error when we trying with below configuration with deepjavalibrary/djl-serving:0.24.0-deepspeed Serving.properties engine=MPI option.model_id=starcoderbase option.trust_remote_code=true option.tensor_parallel_degree=1 option.max_rolling_batch_size=32 option.rolling_batch=auto option.output_formatter=jsonlines option.paged_attention=false option.enable_streaming=true Log: INFO...

prgawade

bug

Issue with serving modes section (documentation)

## Description ### Serving example (CV) This section asks to download ``` curl -O https://mlrepo.djl.ai/model/cv/image_classification/ai/djl/pytorch/resnet18_embedding/0.0.1/resnet18_embedding.zip ``` The file has typos on serving.properties ``` engine=PyTorch width=224, height=224, resize=256, centerCrop=true, normalize=true, translatorFactory=ai.djl.modality.cv.translator.ImageFeatureExtractorFactory...

segundovolante

bug

Add benchmark results or link to key features section

## Description The main project page has a list of key features, specifically interested about performance: * Performance - DJL serving running multithreading inference in a single JVM. Our benchmark...

segundovolante

enhancement

Ability to transform model outputs in DJL Serving

## Description We are using SageMaker for large model inference (LMI) as documented [here](https://docs.aws.amazon.com/sagemaker/latest/dg/large-model-inference-dlc.html) With this notebook https://github.com/deepjavalibrary/djl-demo/blob/master/aws/sagemaker/large-model-inference/sample-llm/rollingbatch_llama_7b_customized_preprocessing.ipynb, we saw there is a way to manipulate the input before it...

rachitchauhan43

enhancement

djl lmi images with vllm and hf quantizaton support

## Description Trying to use the below sample examples notebooks from dj-demo results in error as 0.24.0 image of djl-deepspeed isnt available? . https://github.com/deepjavalibrary/djl-demo/blob/master/aws/sagemaker/large-model-inference/sample-llm/vllm_deploy_llama_13b.ipynb https://github.com/deepjavalibrary/djl-demo/blob/master/aws/sagemaker/large-model-inference/sample-llm/rollingbatch_deploy_llama2-13b-gptq.ipynb @lanking520 and others whats the...

Nagarajj

bug

rolling batch does not work

##Description We have deployed a salesforce codegen-2b-multi model on a Nvidia GPU infrastructure with the following serving.properties engine=MPI option.rolling_batch=lmi-dist # tested with both lmi-dist and auto option.max_rolling_batch_size=8 option.max_rolling_batch_prefill_tokens=1088 option.paged_attention=false option.model_loading_timeout...

prgawade

bug

Does deepspeed engine support Llama2 question-answering task?

## Description I tried to run by each entrypoint to compare their performance on Sagemaker. But the server failed to start with deepspeed entrypoint. This is my serving.properties ``` engine=Python...

YunTaoYoung

bug

tensor parallelism across multiple GPU's

I am following the code as mentioned in the AWS documentation to host GPT-J-6B using DJL serving [ https://github.com/aws/amazon-sagemaker-examples/blob/main/advanced_functionality/pytorch_deploy_large_GPT_model/GPT-J-6B-model-parallel-inference-DJL.ipynb] Providing a tensor parallelism value as 2 in serving.properties creates 2...

samanthvishwas

How to use inference of multiple models

How to use inference of multiple models, such as PaddleOCR, which has three models, namely det, cls, and rec. ![image](https://github.com/deepjavalibrary/djl-serving/assets/16932704/29b5b51d-f217-42f7-8e87-55e9c4327fd6)

polarisunny

Listener timed out in: 30.0 s

![image](https://github.com/deepjavalibrary/djl-serving/assets/35183358/ff8727ec-a8b4-42bf-814e-9670271c2d36)

yuxin7

bug

djl-serving
djl-serving copied to clipboard

Metadata

Deep learning engine not found: MPI Error

Issue with serving modes section (documentation)

Add benchmark results or link to key features section

Ability to transform model outputs in DJL Serving

djl lmi images with vllm and hf quantizaton support

rolling batch does not work

Does deepspeed engine support Llama2 question-answering task?

tensor parallelism across multiple GPU's

How to use inference of multiple models

Listener timed out in: 30.0 s

← Metadata

Owner

Metadata

djl-serving djl-serving copied to clipboard

Metadata

← Metadata

Owner

Metadata

djl-serving
djl-serving copied to clipboard