text-generation-inference icon indicating copy to clipboard operation
text-generation-inference copied to clipboard

Large Language Model Text Generation Inference

Results 639 text-generation-inference issues
Sort by recently updated
recently updated
newest added

### Feature request It would be great if we could specify the model's datatype as a command-line argument. ### Motivation For most PyTorch models, `torch_dtype` is currently set to `torch.float16`...

### System Info versions: python 3.10 sagemaker 2.168.0 (latest) huggingface tgi 0.8.2 (latest) ### Reproduction I'm trying to deploy MPT-30B-instruct and WizardLM-Uncensored-Falcon-40b in SageMaker my config is ``` config =...

### System Info model=gpt2 volume=$HOME/.cache/huggingface/hub num_shard=1 docker run --gpus all --shm-size 1g -p 8081:80 -v $volume:/data ghcr.io/huggingface/text-generation-inference:0.8.1 --model-id $model --num-shard $num_shard ### Information - [X] Docker - [ ] The...

Stale

### Feature request Since the `/generate_stream` provide Token streaming using Server-Sent Events (SSE), the client cannot tell the server to stop the streaming. ### Motivation Sometime, when requesting very large...

The Falcon 40B model occupies nearly 90G of space, takes a long time to download, and does not know the download progress. Is there any other way to download the...

### Feature request Do you have any suggestions on how to be more compatible with the OpenAI api? ### Motivation like https://github.com/go-skynet/LocalAI or https://github.com/lm-sys/FastChat ### Your contribution discuss

## Feature request The feature would be to support accelerated inference with the CTranslate2 framework https://github.com/OpenNMT/CTranslate2 ## Motivation Reasons to CTranslate2 #### faster float16 generation In my case, outperforms VLLM...

Stale

# What does this PR do? This PR automatically points tensors that were removed due to deduplication to their still existing twin. In `server.text_generation_server.utils.convert.py#convert_file`, tensors that have a value equal...

### System Info Hi, I'm using the latest version of text-generation-inference (image sha-ae466a8) on Runpod => docker. When I try to load a GPTQ file from local disk with QUANTIZE...

### Feature request Came across this article `https://kaiokendev.github.io/til#extending-context-to-8k` that suggests by interpolation, we can potentially extend the context for the model. ``` # These two lines: self.scale = 1 /...