text-generation-inference issues

Sharding Error with max_total_tokens and max_input_tokens options in Gemma3-27B-it model

### Description: I am encountering a sharding error when using the gemma-3-27b-it model with the max_total_tokens and max_input_tokens options. The model works without these options, but fails to initialize when...

calycekr

Add support for phi-4-mini and phi-4-multimodal

3

### Feature request Currently, when I check phi-4-mini model with TGI and also phi-4-multimodal model, none of them can be loaded with TGI version 3.1.1. Phi-4-mini need transformer version 4.49.0...

farzanehnakhaee70

Moondream2 | TGI Model support | Intel GPU

1

### Feature request Hi, It seems like the [list](https://huggingface.co/docs/text-generation-inference/en/supported_models) of models currently doesn't support Moondream2, do we have it in the pipeline to support it? meanwhile can you also point...

rskasturi

Support xccl distributed backend

4

Starting from `torch>=2.7` XCCL distributed backend is available for XPU devices (requires torch built with `USE_XCCL=1`). This commit is verified on Intel Data Center GPU Max with Bloom: ``` text-generation-launcher...

dvrogozh

Multi-node inference

### Feature request Hi, I wondered if it's possible to do multi-node inference? In my case I would like to run DeepSeek-V3 671B inference on 9 or more H100s. I...

hrbigelow

Add 'json_schema' alias to GrammarType.Json

1

Adds a serde alias to `GrammarType.Json` such that `{"type": "json_schema", "value": json_schema}` deserializes correctly. This brings the OpenAI-compatible API closer to [OpenAI's official spec](https://platform.openai.com/docs/api-reference/chat/create#chat-create-response_format) and makes the server more interoperable...

aW3st

Added model name label to metrics and added an optional argument --served-model-name

# What does this PR do? Fixes: 1. model_name label is absent in metrics. Fixed this by adding the label in all the LLM metrics. 2. Added an optional argument...

yashaswipiplani

wontfix

TGI metrics don't have model_name label to indicate which model the metrics belong to

1

### Feature request The prometheus metrics exposed lack labels like model_name to filter or group by the model_name. This creates an issue when multiple models are deployed using TGI, causing...

yashaswipiplani

wontfix

Qwen2-VL failed to infer multiple images (Server error: upper bound and larger bound inconsistent with step sign)

7

### System Info ![image](https://github.com/user-attachments/assets/41473025-b60c-4971-839a-e26ac002baea) ### Information - [X] Docker - [ ] The CLI directly ### Tasks - [X] An officially supported command - [ ] My own modifications ###...

AHEADer

Llama 3.3 70B Weird , gibberish outputs in production setup

7

### System Info **Runtime environment:** - Kubernetes Cluster deployment - 4 A100 GPU with 80GB VRAM each - 12 CPU with 32 GB RAM each - **TGI Version**: 3.0.1 (have...

andresC98

text-generation-inference
text-generation-inference copied to clipboard

Metadata

Sharding Error with max_total_tokens and max_input_tokens options in Gemma3-27B-it model

Add support for phi-4-mini and phi-4-multimodal

Moondream2 | TGI Model support | Intel GPU

Support xccl distributed backend

Multi-node inference

Add 'json_schema' alias to GrammarType.Json

Added model name label to metrics and added an optional argument --served-model-name

TGI metrics don't have model_name label to indicate which model the metrics belong to

Qwen2-VL failed to infer multiple images (Server error: upper bound and larger bound inconsistent with step sign)

Llama 3.3 70B Weird , gibberish outputs in production setup

← Metadata

Owner

Metadata

text-generation-inference text-generation-inference copied to clipboard

Metadata

← Metadata

Owner

Metadata

text-generation-inference
text-generation-inference copied to clipboard