text-generation-inference issues

(Prefill) KV Cache Indexing error if started multiple TGI servers concurrently

### System Info ``` Runtime environment: Target: x86_64-unknown-linux-gnu Cargo version: 1.80.0 Commit sha: a094729386b5689aabfba40b7fdb207142dec8d5 Docker label: sha-a094729 nvidia-smi: Mon Oct 21 10:38:14 2024 +-----------------------------------------------------------------------------------------+ | NVIDIA-SMI 550.54.14 Driver Version: 550.54.14...

nathan-az

OpenAI Client format + chat template for a single call

1

### System Info latest docker ### Information - [X] Docker - [ ] The CLI directly ### Tasks - [ ] An officially supported command - [ ] My own...

vitalyshalumov

Image for arm64 (Macbook Pro)

1

### System Info Which image should I use on Macbook Pro? I can't find arm64 image. Please see the below error I'm having: ``` 1 warning found (use docker --debug...

arsentievalex

Excessive use of VRAM for Llama 3.1 8B

1

### System Info - text-generation-inference:2.3.0, deployed on docker - model info: { "model_id": "meta-llama/Llama-3.1-8B-Instruct", "model_sha": "0e9e39f249a16976918f6564b8830bc894c89659", "model_pipeline_tag": "text-generation", "max_concurrent_requests": 128, "max_best_of": 2, "max_stop_sequences": 4, "max_input_tokens": 5000, "max_total_tokens": 6024, "validation_workers": 2,...

ukito-pl

How do you download a subfile?

1

### System Info Specifically: bartowski/NemoMix-Unleashed-12B-GGUF/NemoMix-Unleashed-12B-Q4_K_M.gguf I tried: ``` command: --model-id bartowski/NemoMix-Unleashed-12B-GGUF/NemoMix-Unleashed-12B-Q4_K_M.gguf ``` but it failed. ### Information - [X] Docker - [ ] The CLI directly ### Tasks - [...

PeterTucker

Add AMD gfx110* support

### Feature request Add support for the gfx1101 and gfx1100 GPUs. Currently the [official docs indicate lack of support for this hardware](https://huggingface.co/docs/text-generation-inference/en/installation_amd). ### Motivation Allow developers who have a 7900xt...

cazlo

input tokens exceeded `max_input_tokens`

### System Info Docker Runtime environment: Target: x86_64-unknown-linux-gnu Cargo version: 1.80.0 Commit sha: 169178b937d0c4173b0fdcd6bf10a858cfe4f428 Docker label: sha-169178b nvidia-smi Args { model_id: "/share/base_model/Mistral-Nemo-Instruct-2407-GPTQ", revision: None, validation_workers: 2, sharded: None, num_shard: None,...

LanSnowZ

[New Model Request] NVLM

### Model description I'm creating this issue to gauge how interested people are in having the NVLM model added to TGI. If you would like to see it added, please...

nbroad1881

TGI drops requests when 150 requests are sent continuously at the rate of 5 Request Per Second in AMD 8 X MI300x with Llama 3.1 405B

### System Info TGI Docker Image:` ghcr.io/huggingface/text-generation-inference:sha-11d7af7-rocm` MODEL: meta-llama/Llama-3.1-405B-Instruct Hardware used: Intel® Xeon® Platinum 8470 2G, 52C/104T, 16GT/s, 105M Cache, Turbo, HT (350W) [x2] AMD MI300X GPU OAM 192GB 750W...

Bihan

Passing an `image_url` to a text-only model should fail explicitly

(noticed this error while working on https://github.com/huggingface/huggingface_hub/pull/2556) ### System Info Using TGI through Inference API (e.g. [mistralai/Mistral-Nemo-Instruct-2407](https://huggingface.co/mistralai/Mistral-Nemo-Instruct-2407)). At the time I open this issue [`/info`](https://api-inference.huggingface.co/models/mistralai/Mistral-Nemo-Instruct-2407/info) returns ```js { "model_id": "mistralai/Mistral-Nemo-Instruct-2407",...

Wauplin

text-generation-inference
text-generation-inference copied to clipboard

Metadata

(Prefill) KV Cache Indexing error if started multiple TGI servers concurrently

OpenAI Client format + chat template for a single call

Image for arm64 (Macbook Pro)

Excessive use of VRAM for Llama 3.1 8B

How do you download a subfile?

Add AMD gfx110* support

input tokens exceeded `max_input_tokens`

[New Model Request] NVLM

TGI drops requests when 150 requests are sent continuously at the rate of 5 Request Per Second in AMD 8 X MI300x with Llama 3.1 405B

Passing an `image_url` to a text-only model should fail explicitly

← Metadata

Owner

Metadata

text-generation-inference text-generation-inference copied to clipboard

Metadata

← Metadata

Owner

Metadata

text-generation-inference
text-generation-inference copied to clipboard