text-generation-inference icon indicating copy to clipboard operation
text-generation-inference copied to clipboard

Large Language Model Text Generation Inference

Results 639 text-generation-inference issues
Sort by recently updated
recently updated
newest added

### Feature request Longer context up to 8k tokens, the given discussion and notebook generate promising results ### Motivation Discussion: https://www.reddit.com/r/LocalLLaMA/comments/14lz7j5/ntkaware_scaled_rope_allows_llama_models_to_have/ Colab Notebook: https://colab.research.google.com/drive/1VI2nhlyKvd5cw4-zHvAIk00cAVj2lCCC#scrollTo=d2ceb547 ### Your contribution As it's only...

# What does this PR do? This adds a non flash version of MPT. Flash is harder because we need to create a bias ready cuda kernel of flash attention....

# What does this PR do? Adds a new flag propagated everywhere. Disjoint from `--quantize` which also changes the actual dtype of layers. Fixes #490 Fixes # (issue) ## Before...

I was just wondering how the GPU memory requirements vary depending on model size/batch size of request/max tokens. In doing some experiments where I needed the server to keep running...

### Feature request [Guidance](https://github.com/microsoft/guidance) can control the generated format, which could be a nice feature if it is built-in - Add extra parameter to `/generate` and `/generate_stream` protocol to specify...

Stale

### System Info Target: x86_64-unknown-linux-gnu Cargo version: 1.69.0 Commit sha: N/A Docker label: N/A nvidia-smi: Wed Jun 28 20:17:18 2023 +-----------------------------------------------------------------------------+ | NVIDIA-SMI 525.105.17 Driver Version: 525.105.17 CUDA Version: 12.0...

Currently it will execute gen-server and install-transformers first, and then start upgrading pip. It should first upgrade pip. # What does this PR do? Make sure pip is updated ##...

### Feature request I'm running TGI on Runpod, and am trying to load a model from a private Huggingface repository. Despite passing in a value for HUGGINGFACE_HUB_TOKEN into runpod's Environment...

### System Info docker image: ghcr.io/huggingface/text-generation-inference:0.8 ### Information - [X] Docker - [ ] The CLI directly ### Tasks - [X] An officially supported command - [ ] My own...