text-generation-inference issues

Deploy error for Llama-3.2-vision-11B: "Sharded is not supported for AutoModel"

4

### System Info Hi Team, When deploying the model on AWS with `huggingface-pytorch-tgi-inference:2.3.0-tgi2.2.0`, I got the above error. Could you tell me when can TGI provide the new image? Is...

xuan1905

huggingface_hub.errors.GenerationError: Request failed during generation: Server error:

### System Info Yes, the output did not say whut error, it just said Server error: and then blank. I am using a windows 11 environment with python 11 huggingface...

ivanhe123

Remove max_stop_sequences by default

### System Info https://github.com/huggingface/text-generation-inference/blob/1028996fb380f07ebb2a9de1d2795e176f845c59/launcher/src/main.rs#L427-L428 I think it would be best for this limit to be non-existent by default, rather than 4. Or at least something higher like 16. Though client...

sestinj

Question: What is preferred way to cite TGI/repo? Didnt see a citation file.

elegantmoose

Inconsistent Behavior with Multi-LoRA Deployment

# Multi-LoRA Deployment Inconsistency ## System Information - **Model**: Fine-tuned adapter on unsloth/mistral-7b-instruct-v0.3 (LoRA rank 128) - **Container**: 763104351884.dkr.ecr.us-east-1.amazonaws.com/huggingface-pytorch-tgi-inference:2.3.0-tgi2.2.0-gpu-py310-cu121-ubuntu22.04-v2.0 - **Deployment**: AWS SageMaker ## Environment - [x] Docker - [...

charlatan-101

bitsandbytes: upgrade and enable CUDA Graphs for 4bit by default

1

# What does this PR do? This PR sets the new minimum requirement for bitsandbytes to the most recent release, [v0.45.0](https://github.com/bitsandbytes-foundation/bitsandbytes/releases/tag/0.45.0). from v0.43.0. CUDA Graphs support for 4bit was enabled...

matthewdouglas

Use ROCM 6.3.1

# What does this PR do? Fixes #3137 ## Before submitting - [ ] This PR fixes a typo or improves the docs (you can dismiss the other checks if...

mht-sharma

Deepseek R1 fails to start on Gaudi 2

1

danielfleischer

Support for Mistral Small 3.1

1

### Model description Please add support for [mistralai/Mistral-Small-3.1-24B-Instruct-2503](https://huggingface.co/mistralai/Mistral-Small-3.1-24B-Instruct-2503) model. ### Open source status - [ ] The model implementation is available - [x] The model weights are available ### Provide...

meetzuber

Quantized BNB-4bit models are not working.

### System Info Testing on 2x 4070 TI Super ``` - MODEL_ID=unsloth/Qwen2.5-Coder-32B-bnb-4bit - MODEL_ID=unsloth/Mistral-Small-24B-Instruct-2501-bnb-4bit ``` ``` text-generation-inference-1 | [rank1]: │ /usr/src/server/text_generation_server/utils/weights.py:275 in get_sharded │ text-generation-inference-1 | [rank1]: │ │ text-generation-inference-1...

v3ss0n

text-generation-inference
text-generation-inference copied to clipboard

Metadata

Deploy error for Llama-3.2-vision-11B: "Sharded is not supported for AutoModel"

huggingface_hub.errors.GenerationError: Request failed during generation: Server error:

Remove max_stop_sequences by default

Question: What is preferred way to cite TGI/repo? Didnt see a citation file.

Inconsistent Behavior with Multi-LoRA Deployment

bitsandbytes: upgrade and enable CUDA Graphs for 4bit by default

Use ROCM 6.3.1

Deepseek R1 fails to start on Gaudi 2

Support for Mistral Small 3.1

Quantized BNB-4bit models are not working.

← Metadata

Owner

Metadata

text-generation-inference text-generation-inference copied to clipboard

Metadata

← Metadata

Owner

Metadata

text-generation-inference
text-generation-inference copied to clipboard