text-generation-inference icon indicating copy to clipboard operation
text-generation-inference copied to clipboard

Large Language Model Text Generation Inference

Results 639 text-generation-inference issues
Sort by recently updated
recently updated
newest added

### System Info ``` Runtime environment: Target: x86_64-unknown-linux-gnu Cargo version: 1.80.0 Commit sha: a094729386b5689aabfba40b7fdb207142dec8d5 Docker label: sha-a094729 nvidia-smi: Mon Oct 21 10:38:14 2024 +-----------------------------------------------------------------------------------------+ | NVIDIA-SMI 550.54.14 Driver Version: 550.54.14...

### System Info latest docker ### Information - [X] Docker - [ ] The CLI directly ### Tasks - [ ] An officially supported command - [ ] My own...

### System Info Which image should I use on Macbook Pro? I can't find arm64 image. Please see the below error I'm having: ``` 1 warning found (use docker --debug...

### System Info - text-generation-inference:2.3.0, deployed on docker - model info: { "model_id": "meta-llama/Llama-3.1-8B-Instruct", "model_sha": "0e9e39f249a16976918f6564b8830bc894c89659", "model_pipeline_tag": "text-generation", "max_concurrent_requests": 128, "max_best_of": 2, "max_stop_sequences": 4, "max_input_tokens": 5000, "max_total_tokens": 6024, "validation_workers": 2,...

### System Info Specifically: bartowski/NemoMix-Unleashed-12B-GGUF/NemoMix-Unleashed-12B-Q4_K_M.gguf I tried: ``` command: --model-id bartowski/NemoMix-Unleashed-12B-GGUF/NemoMix-Unleashed-12B-Q4_K_M.gguf ``` but it failed. ### Information - [X] Docker - [ ] The CLI directly ### Tasks - [...

### Feature request Add support for the gfx1101 and gfx1100 GPUs. Currently the [official docs indicate lack of support for this hardware](https://huggingface.co/docs/text-generation-inference/en/installation_amd). ### Motivation Allow developers who have a 7900xt...

### System Info Docker Runtime environment: Target: x86_64-unknown-linux-gnu Cargo version: 1.80.0 Commit sha: 169178b937d0c4173b0fdcd6bf10a858cfe4f428 Docker label: sha-169178b nvidia-smi Args { model_id: "/share/base_model/Mistral-Nemo-Instruct-2407-GPTQ", revision: None, validation_workers: 2, sharded: None, num_shard: None,...

### Model description I'm creating this issue to gauge how interested people are in having the NVLM model added to TGI. If you would like to see it added, please...

### System Info TGI Docker Image:` ghcr.io/huggingface/text-generation-inference:sha-11d7af7-rocm` MODEL: meta-llama/Llama-3.1-405B-Instruct Hardware used: Intel® Xeon® Platinum 8470 2G, 52C/104T, 16GT/s, 105M Cache, Turbo, HT (350W) [x2] AMD MI300X GPU OAM 192GB 750W...

(noticed this error while working on https://github.com/huggingface/huggingface_hub/pull/2556) ### System Info Using TGI through Inference API (e.g. [mistralai/Mistral-Nemo-Instruct-2407](https://huggingface.co/mistralai/Mistral-Nemo-Instruct-2407)). At the time I open this issue [`/info`](https://api-inference.huggingface.co/models/mistralai/Mistral-Nemo-Instruct-2407/info) returns ```js { "model_id": "mistralai/Mistral-Nemo-Instruct-2407",...