text-generation-inference
text-generation-inference copied to clipboard
Large Language Model Text Generation Inference
This PR makes tool calling aware of the name of the function selected. Fixes: https://github.com/huggingface/text-generation-inference/issues/1657 Thank you @puppetm4st3r for the helpful snippets, large parts of this PR are simply refactors...
# What does this PR do? - Changed all models to extract `embed_tokens` in order to enable llava to separately call the embeddings and the core model layers. - Added...
wrap text-generation-launcher in docker image mask ldconfig failures to user (no need in most cases anyway)
WIP This PR explores the differences using torch.compile on select ops and starte work on reproducible benches
### Feature request Hello, thank you for all the work! With the new NVIDIA partnership supplying H100 GPUs, could you please implement FP8 TransformerEngine speedup? ### Motivation That would mean...
This PR allows the `CompletionRequest.prompt` to be sent as a string or array of strings. When an array is sent the first value will be used if it's a string;...
### System Info ``` text-generation-launcher --env 2024-04-01T20:49:45.871764Z INFO text_generation_launcher: Runtime environment: Target: x86_64-unknown-linux-gnu Cargo version: 1.75.0 Commit sha: e6bb3ff81fd670ad2f54904676f8165367dd47f8 Docker label: sha-e6bb3ff ``` ### Information - [X] Docker - [...
### Feature request Add support for exl2 quantization format via argument --quantization exl2 which will allow to load exllamav2 quantized models with various quantization schemes (not GPTQ). ### Motivation There...
### Feature request On the `/tokenize` endpoint of TGI, add an option to apply the chat template from the model's tokenizer, if existant, before tokenizing. ### Motivation The `/tokenize` endpoint...
### System Info 最新版本docker ### Information - [X] Docker - [ ] The CLI directly ### Tasks - [ ] An officially supported command - [ ] My own modifications...