text-generation-inference issues

Adapt the response_format closer to OpenAIs format

### Feature request During testing for [https://github.com/BerriAI/litellm/pull/7747](https://github.com/BerriAI/litellm/pull/7747) , I found that the differences between OpenAI and TGI are more fundamental than just the optionality of the provided schema. The `response_format`...

jorado

Model warmup fails after adding Triton indexing kernels

21

### System Info I was using v2.3.1 via docker and everything was working. When I updated to later versions including the latest my TGI doesn't start due to an error:...

YaserJaradeh

VRAM usage increases in version 3.1.0

### System Info Using the 3.1.0 docker container in an AWS g6.12xlarge instance. `--env` output: ``` 2025-02-19T17:51:35.116359Z INFO text_generation_launcher: Runtime environment: Target: x86_64-unknown-linux-gnu Cargo version: 1.84.0 Commit sha: 463228ebfc444f60fa351da34a2ba158af0fe9d8 Docker...

aW3st

Warmup fails with Google Flan T5 models

2

### System Info text-generation-inference docker image version 3.1.0 with the following parameters (log by TGI): INFO text_generation_launcher: Args { model_id: "google/flan-t5-xxl", revision: None, validation_workers: 2, sharded: None, num_shard: Some( 1,...

TomerG711

Remove Conda from Docker Installation

1

### System Info The docker file has non-commercial conda installation whereas the tgi is Apache-2.0 License. Installing tgi through docker creates conda license violation. https://github.com/huggingface/text-generation-inference/blob/main/Dockerfile ### Information - [x] Docker...

rjmehta1993

Unsupported model type xlm-roberta

### System Info docker deploy ``` $ nvidia-smi Thu Feb 13 23:44:10 2025 +-----------------------------------------------------------------------------------------+ | NVIDIA-SMI 565.57.01 Driver Version: 565.57.01 CUDA Version: 12.7 | |-----------------------------------------+------------------------+----------------------+ | GPU Name Persistence-M |...

elvizlai

Resource underutilization, thread thrashing: CPU affinity ignores allowed CPUs and cannot be switched off

4

### System Info CPU affinity implementation (introduced in commit 59922f9bc16afee9efcc7ee1c5f9d753ef314ffa, first released in v2.3.0, until current HEAD (4b8cda684b45b799de01a65e3fe3422a34a621d3) ignores already existing CPU pinning for the process. ### Information - [x]...

askervin

WARN text_generation_launcher: Unkown compute for card nvidia-geforce-rtx-3090

### Feature request Any chance we could get support for RTX 3090? ### Motivation I have an RTX 3090 and would like to utilize it. ### Your contribution I'm not...

bmilesp

Nonsense responses with n-gram speculative decoding

1

### System Info text-generation-inference 3.1.0 (saw the same issue on 3.0.0) ```shell model="NousResearch/Meta-Llama-3.1-8B-Instruct" volume="$PWD/data" docker create --name llama3.1-speculate2 --gpus all --shm-size 1g -p 8080:80 -v $volume:/data ghcr.io/huggingface/text-generation-inference:3.1.0 --model-id $model --quantize...

olliestanley

Warmup fails for Qwen 2 VL on AMD

### System Info I'm trying to deploy Qwen/Qwen2-VL-2B-Instruct on MI210 using ghcr.io/huggingface/text-generation-inference:3.1.0-rocm but it fails in the warmup step with error: ``` INFO text_generation_launcher: Using attention paged - Prefix caching...

almersawi

text-generation-inference
text-generation-inference copied to clipboard

Metadata

Adapt the response_format closer to OpenAIs format

Model warmup fails after adding Triton indexing kernels

VRAM usage increases in version 3.1.0

Warmup fails with Google Flan T5 models

Remove Conda from Docker Installation

Unsupported model type xlm-roberta

Resource underutilization, thread thrashing: CPU affinity ignores allowed CPUs and cannot be switched off

WARN text_generation_launcher: Unkown compute for card nvidia-geforce-rtx-3090

Nonsense responses with n-gram speculative decoding

Warmup fails for Qwen 2 VL on AMD

← Metadata

Owner

Metadata

text-generation-inference text-generation-inference copied to clipboard

Metadata

← Metadata

Owner

Metadata

text-generation-inference
text-generation-inference copied to clipboard