text-generation-inference issues

Failure when start the model using TGI 3

### System Info I tried to serve llama3.1-8b using TGI on A10 (24G) on context length 4k. coomand: ``` docker run --gpus all -it --rm -p 8000:80 ghcr.io/huggingface/text-generation-inference:3.0.0 --model-id NousResearch/Meta-Llama-3.1-8B-Instruct...

hahmad2008

Install `text-generation-server` from `poetry.lock` export

# What does this PR do? This PR installs the `text-generation-server` Python requirements from an exported `requirements.txt`-like file generated out of the `poetry.lock`, to be able to reuse the generated...

alvarobartt

Update tensor_parallel.py

1

Resolve the issue of abnormal conversation performance in the Baichuan large model. # Fix the bug in the norm_head adaptation for Baichuan. Fixes https://github.com/huggingface/text-generation-inference/issues/2780 https://huggingface.co/baichuan-inc/Baichuan2-13B-Chat/blob/main/modeling_baichuan.py#:~:text=self.weight.data%20%3D%20nn.functional.normalize(self.weight) ![image](https://github.com/user-attachments/assets/76a821b6-e998-43d3-b0f6-ebc1f7614c00) @OlivierDehaene OR @Narsil

Lacacy

BUILD_EXTENSIONS=False make install error！！！

2

### System Info none ### Information - [ ] Docker - [ ] The CLI directly ### Tasks - [ ] An officially supported command - [ ] My own...

tangliangwu

[broken-compatibility] chat completion breaks base64 standard / openAI spec

### System Info latest docker pull, --version says: `text-generation-launcher 3.0.0` model used: https://huggingface.co/AI-Safeguard/Ivy-VL-llava ### Information - [X] Docker - [ ] The CLI directly ### Tasks - [X] An officially...

lucyknada

Exposes TensorRT-LLM finish reason to the server

mfuntowicz

[WIP] Add gfx1100 support to AMD pytorch build

2

# What does this PR do? Fixes #2641 ## Before submitting - [ ] This PR fixes a typo or improves the docs (you can dismiss the other checks if...

cazlo

Complexe response format lead the container to run forever on CPU

1

### System Info System: `Linux 4.18.0-553.22.1.el8_10.x86_64 #1 SMP Wed Sep 25 09:20:43 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux` `Rocky Linux 8.10` Model: `mistralai/Mistral-Nemo-Instruct-2407` Hardware: * GPU: `NVIDIA A100-SXM4-80GB` * CPU:...

Rictus

On-The-Fly Quantization for Inference appears not to be working as per documentation.

7

### System Info **Platform:** Dell 760xa with 4x L40S GPUs **OS Description:** Ubuntu 22.04.5 LTS **GPU:** NVIDIA-SMI 550.90.07 Driver Version: 550.90.07 CUDA Version: 12.4 **Python:** 3.10.12 **Docker:** 26.1.5 **Model:** [Deploy...

colin-byrneireland1

Triton Error [CUDA]

4

### System Info docker version: ghcr.io/huggingface/text-generation-inference:sha-d2ed52f model: Qwen2.5-1.5B-Instruct (tested on Qwen2.5-32B-Instruct as well) ### Information - [X] Docker - [ ] The CLI directly ### Tasks - [X] An officially...

paulcx

text-generation-inference
text-generation-inference copied to clipboard

Metadata

Failure when start the model using TGI 3

Install `text-generation-server` from `poetry.lock` export

Update tensor_parallel.py

BUILD_EXTENSIONS=False make install error！！！

[broken-compatibility] chat completion breaks base64 standard / openAI spec

Exposes TensorRT-LLM finish reason to the server

[WIP] Add gfx1100 support to AMD pytorch build

Complexe response format lead the container to run forever on CPU

On-The-Fly Quantization for Inference appears not to be working as per documentation.

Triton Error [CUDA]

← Metadata

Owner

Metadata

text-generation-inference text-generation-inference copied to clipboard

Metadata

← Metadata

Owner

Metadata

text-generation-inference
text-generation-inference copied to clipboard