text-generation-inference
text-generation-inference copied to clipboard
Large Language Model Text Generation Inference
### Feature request Longer context up to 8k tokens, the given discussion and notebook generate promising results ### Motivation Discussion: https://www.reddit.com/r/LocalLLaMA/comments/14lz7j5/ntkaware_scaled_rope_allows_llama_models_to_have/ Colab Notebook: https://colab.research.google.com/drive/1VI2nhlyKvd5cw4-zHvAIk00cAVj2lCCC#scrollTo=d2ceb547 ### Your contribution As it's only...
# What does this PR do? This adds a non flash version of MPT. Flash is harder because we need to create a bias ready cuda kernel of flash attention....
# What does this PR do? Adds a new flag propagated everywhere. Disjoint from `--quantize` which also changes the actual dtype of layers. Fixes #490 Fixes # (issue) ## Before...
I was just wondering how the GPU memory requirements vary depending on model size/batch size of request/max tokens. In doing some experiments where I needed the server to keep running...
### Feature request [Guidance](https://github.com/microsoft/guidance) can control the generated format, which could be a nice feature if it is built-in - Add extra parameter to `/generate` and `/generate_stream` protocol to specify...
### System Info Target: x86_64-unknown-linux-gnu Cargo version: 1.69.0 Commit sha: N/A Docker label: N/A nvidia-smi: Wed Jun 28 20:17:18 2023 +-----------------------------------------------------------------------------+ | NVIDIA-SMI 525.105.17 Driver Version: 525.105.17 CUDA Version: 12.0...
Currently it will execute gen-server and install-transformers first, and then start upgrading pip. It should first upgrade pip. # What does this PR do? Make sure pip is updated ##...
### Feature request I'm running TGI on Runpod, and am trying to load a model from a private Huggingface repository. Despite passing in a value for HUGGINGFACE_HUB_TOKEN into runpod's Environment...
### System Info docker image: ghcr.io/huggingface/text-generation-inference:0.8 ### Information - [X] Docker - [ ] The CLI directly ### Tasks - [X] An officially supported command - [ ] My own...