text-generation-inference icon indicating copy to clipboard operation
text-generation-inference copied to clipboard

Large Language Model Text Generation Inference

Results 639 text-generation-inference issues
Sort by recently updated
recently updated
newest added

### Description: I am encountering a sharding error when using the gemma-3-27b-it model with the max_total_tokens and max_input_tokens options. The model works without these options, but fails to initialize when...

### Feature request Currently, when I check phi-4-mini model with TGI and also phi-4-multimodal model, none of them can be loaded with TGI version 3.1.1. Phi-4-mini need transformer version 4.49.0...

### Feature request Hi, It seems like the [list](https://huggingface.co/docs/text-generation-inference/en/supported_models) of models currently doesn't support Moondream2, do we have it in the pipeline to support it? meanwhile can you also point...

Starting from `torch>=2.7` XCCL distributed backend is available for XPU devices (requires torch built with `USE_XCCL=1`). This commit is verified on Intel Data Center GPU Max with Bloom: ``` text-generation-launcher...

### Feature request Hi, I wondered if it's possible to do multi-node inference? In my case I would like to run DeepSeek-V3 671B inference on 9 or more H100s. I...

Adds a serde alias to `GrammarType.Json` such that `{"type": "json_schema", "value": json_schema}` deserializes correctly. This brings the OpenAI-compatible API closer to [OpenAI's official spec](https://platform.openai.com/docs/api-reference/chat/create#chat-create-response_format) and makes the server more interoperable...

# What does this PR do? Fixes: 1. model_name label is absent in metrics. Fixed this by adding the label in all the LLM metrics. 2. Added an optional argument...

wontfix

### Feature request The prometheus metrics exposed lack labels like model_name to filter or group by the model_name. This creates an issue when multiple models are deployed using TGI, causing...

wontfix

### System Info ![image](https://github.com/user-attachments/assets/41473025-b60c-4971-839a-e26ac002baea) ### Information - [X] Docker - [ ] The CLI directly ### Tasks - [X] An officially supported command - [ ] My own modifications ###...

### System Info **Runtime environment:** - Kubernetes Cluster deployment - 4 A100 GPU with 80GB VRAM each - 12 CPU with 32 GB RAM each - **TGI Version**: 3.0.1 (have...