text-generation-inference
text-generation-inference copied to clipboard
Large Language Model Text Generation Inference
### System Info Hi Team, When deploying the model on AWS with `huggingface-pytorch-tgi-inference:2.3.0-tgi2.2.0`, I got the above error. Could you tell me when can TGI provide the new image? Is...
### System Info Yes, the output did not say whut error, it just said Server error: and then blank. I am using a windows 11 environment with python 11 huggingface...
### System Info https://github.com/huggingface/text-generation-inference/blob/1028996fb380f07ebb2a9de1d2795e176f845c59/launcher/src/main.rs#L427-L428 I think it would be best for this limit to be non-existent by default, rather than 4. Or at least something higher like 16. Though client...
# Multi-LoRA Deployment Inconsistency ## System Information - **Model**: Fine-tuned adapter on unsloth/mistral-7b-instruct-v0.3 (LoRA rank 128) - **Container**: 763104351884.dkr.ecr.us-east-1.amazonaws.com/huggingface-pytorch-tgi-inference:2.3.0-tgi2.2.0-gpu-py310-cu121-ubuntu22.04-v2.0 - **Deployment**: AWS SageMaker ## Environment - [x] Docker - [...
# What does this PR do? This PR sets the new minimum requirement for bitsandbytes to the most recent release, [v0.45.0](https://github.com/bitsandbytes-foundation/bitsandbytes/releases/tag/0.45.0). from v0.43.0. CUDA Graphs support for 4bit was enabled...
# What does this PR do? Fixes #3137 ## Before submitting - [ ] This PR fixes a typo or improves the docs (you can dismiss the other checks if...
### System Info ``` -----------------------------------------------------------------------------+ | HL-SMI Version: hl-1.20.0-fw-58.1.1.1 | | Driver Version: 1.20.0-bd87f71 | | Nic Driver Version: 1.20.0-e4fe12d | |-------------------------------+----------------------+----------------------+ | AIP Name Persistence-M| Bus-Id Disp.A | Volatile...
### Model description Please add support for [mistralai/Mistral-Small-3.1-24B-Instruct-2503](https://huggingface.co/mistralai/Mistral-Small-3.1-24B-Instruct-2503) model. ### Open source status - [ ] The model implementation is available - [x] The model weights are available ### Provide...
### System Info Testing on 2x 4070 TI Super ``` - MODEL_ID=unsloth/Qwen2.5-Coder-32B-bnb-4bit - MODEL_ID=unsloth/Mistral-Small-24B-Instruct-2501-bnb-4bit ``` ``` text-generation-inference-1 | [rank1]: │ /usr/src/server/text_generation_server/utils/weights.py:275 in get_sharded │ text-generation-inference-1 | [rank1]: │ │ text-generation-inference-1...