Eduard Zl

Results 7 issues of Eduard Zl

Hello. First would like to a big positive feedback to Hortonworks for the Streamline product. :) Now.. I have a case when the output of Kafka source component is a...

### Your current environment vLLM 0.5.4 CUDA 12.4 flashinfer-0.1.5 A100 GPU ### 🐛 Describe the bug I am using vLLM latest release (0.5.4). Installed "flashinfer" attention backend: https://github.com/flashinfer-ai/flashinfer/releases/download/v0.1.5/flashinfer-0.1.5+cu124torch2.4-cp310-cp310-linux_x86_64.whl The inference...

bug

Hello. Is there any way I can set set environment variable for the djl-serving container by using serving.properties file ? I would like to set VLLM_USE_V1=0 but by modifying serving.properties...

## Description We are using DJL container 763104351884.dkr.ecr.us-west-2.amazonaws.com/djl-inference:0.31.0-lmi13.0.0-cu124 with vLLM as inference engine to serve Llama 3.1 - Llama 3.3 models. Models files include "generation_config.json" file which can specify default...

enhancement
stale

Hello. I am using 763104351884.dkr.ecr.us-east-1.amazonaws.com/djl-inference:0.31.0-lmi13.0.0-cu124 container to run inference for Llama3.3-70B-Instruct. The container is being launched using Docker. Have created repo dir with 2 models : 70B model and 8B...

stale

Hello. I have pulled 763104351884.dkr.ecr.us-west-2.amazonaws.com/djl-inference:0.33.0-tensorrtllm0.21.0-cu128 container to use it for serving of Llama based model. Here is my serving.properties file content: ``` engine=MPI option.rolling_batch=trtllm option.trust_remote_code=true option.max_input_len=32768 option.max_output_len=32768 option.max_num_tokens=32768 option.max_rolling_batch_size=32 option.tensor_parallel_degree=1...

bug
stale

### Please check that this issue hasn't been reported before. - [x] I searched previous [Bug Reports](https://github.com/axolotl-ai-cloud/axolotl/labels/bug) didn't find any similar reports. ### Expected Behavior I am trying to run...

bug