djl-serving
djl-serving copied to clipboard
A universal scalable machine learning model deployment solution
## Description ## Adds passthrough support for generation config for on device embedding.
## Description Getting Junk values and number of tokens generated less for starcoderbase model with rolling batch type vllm And also accuracy of generated text is also low. ## Expected...
This PR contains a renovated Actions setup for DJL Serving as promised in https://github.com/deepjavalibrary/djl-serving/pull/1264. The major change is to create a new nightly orchestration action that will call the build,...
## Description (A clear and concise description of what the bug is.) can't install djlbench on aarch64/arm64 Ubuntu platform using snap installer. However, it can be installed by explicitly downloading...
## Description Do you intend to add [Attention Sinks](https://github.com/huggingface/transformers/commit/633215ba58fe5114d8c8d32e415a04600e010701) streaming as an alternative to the current implementations of streaming for huggingface, vllm and scheduler rolling back modes?
I am trying to understand whether I am using vLLM for my deployment here with the following settting: ``` option.rolling_batch = auto ``` I can't seem to find whether it...
# Requirement Description serving's support for microservice registries, such as nacos. You can specify the address of the registry at runtime, and djl serving automatically registers with the microservice registry....
## Description When enabling streaming with Llama2, Mistral models (models using LlamaTokenizer), this doesn't output appropriate white spaces. For example this outputs text like `DaenerysistheKhaleesi` ### Expected Behavior Streaming output...
This is a refactor to simplify the handling of tensor parallel degree. Before, it is read independently in 3+ locations in code and the behavior determining the tpDegree is hard...
## Description Tokens not streaming not working with rolling batch ### Expected Behavior (what's the expected behavior?) ### Error Message ## How to Reproduce? (If you developed your own code,...