djl-serving icon indicating copy to clipboard operation
djl-serving copied to clipboard

A universal scalable machine learning model deployment solution

Results 54 djl-serving issues
Sort by recently updated
recently updated
newest added

Description: java.util.concurrent.CompletionException: java.lang.IllegalArgumentException: Deep learning engine not found: MPI Error when we trying with below configuration with deepjavalibrary/djl-serving:0.24.0-deepspeed Serving.properties engine=MPI option.model_id=starcoderbase option.trust_remote_code=true option.tensor_parallel_degree=1 option.max_rolling_batch_size=32 option.rolling_batch=auto option.output_formatter=jsonlines option.paged_attention=false option.enable_streaming=true Log: INFO...

bug

## Description ### Serving example (CV) This section asks to download ``` curl -O https://mlrepo.djl.ai/model/cv/image_classification/ai/djl/pytorch/resnet18_embedding/0.0.1/resnet18_embedding.zip ``` The file has typos on serving.properties ``` engine=PyTorch width=224, height=224, resize=256, centerCrop=true, normalize=true, translatorFactory=ai.djl.modality.cv.translator.ImageFeatureExtractorFactory...

bug

## Description The main project page has a list of key features, specifically interested about performance: * Performance - DJL serving running multithreading inference in a single JVM. Our benchmark...

enhancement

## Description We are using SageMaker for large model inference (LMI) as documented [here](https://docs.aws.amazon.com/sagemaker/latest/dg/large-model-inference-dlc.html) With this notebook https://github.com/deepjavalibrary/djl-demo/blob/master/aws/sagemaker/large-model-inference/sample-llm/rollingbatch_llama_7b_customized_preprocessing.ipynb, we saw there is a way to manipulate the input before it...

enhancement

## Description Trying to use the below sample examples notebooks from dj-demo results in error as 0.24.0 image of djl-deepspeed isnt available? . https://github.com/deepjavalibrary/djl-demo/blob/master/aws/sagemaker/large-model-inference/sample-llm/vllm_deploy_llama_13b.ipynb https://github.com/deepjavalibrary/djl-demo/blob/master/aws/sagemaker/large-model-inference/sample-llm/rollingbatch_deploy_llama2-13b-gptq.ipynb @lanking520 and others whats the...

bug

##Description We have deployed a salesforce codegen-2b-multi model on a Nvidia GPU infrastructure with the following serving.properties engine=MPI option.rolling_batch=lmi-dist # tested with both lmi-dist and auto option.max_rolling_batch_size=8 option.max_rolling_batch_prefill_tokens=1088 option.paged_attention=false option.model_loading_timeout...

bug

## Description I tried to run by each entrypoint to compare their performance on Sagemaker. But the server failed to start with deepspeed entrypoint. This is my serving.properties ``` engine=Python...

bug

I am following the code as mentioned in the AWS documentation to host GPT-J-6B using DJL serving [ https://github.com/aws/amazon-sagemaker-examples/blob/main/advanced_functionality/pytorch_deploy_large_GPT_model/GPT-J-6B-model-parallel-inference-DJL.ipynb] Providing a tensor parallelism value as 2 in serving.properties creates 2...

How to use inference of multiple models, such as PaddleOCR, which has three models, namely det, cls, and rec. ![image](https://github.com/deepjavalibrary/djl-serving/assets/16932704/29b5b51d-f217-42f7-8e87-55e9c4327fd6)

![image](https://github.com/deepjavalibrary/djl-serving/assets/35183358/ff8727ec-a8b4-42bf-814e-9670271c2d36)

bug