text-generation-inference icon indicating copy to clipboard operation
text-generation-inference copied to clipboard

Large Language Model Text Generation Inference

Results 639 text-generation-inference issues
Sort by recently updated
recently updated
newest added

### System Info - AWS `sagemaker` 2.163.0 - g5.12xlarge instance type with 4 NVIDIA A10G GPUs and 96GB of GPU memory ### Information - [X] Docker - [ ] The...

### Feature request [vLLM](https://github.com/vllm-project/vllm) is fast with efficient management of attention key and value memory with PagedAttention, serving higher throughput than TGI. ### Motivation Adopting PagedAttention would increase throughput and...

# What does this PR do? Adds an integration test on llama-7b-gptq Fixes # (issue) ## Before submitting - [ ] This PR fixes a typo or improves the docs...

### Feature request Add an additional option to specify `use_fast` flag for `AutoTokenizer`. ### Motivation Some models have slightly different behavior, or buggy versions, of slow or fast tokenizer. It...

### System Info Following the guide given on https://huggingface.co/blog/sagemaker-huggingface-llm, trying to deploy a fine tuned Falcon 7B model yields the following errors: ``` Error: DownloadError File "/opt/conda/lib/python3.9/site-packages/text_generation_server/utils/convert.py", line 84, in...

I encountered an issue while using the Falcon 40B Instruct model. Here are the steps I followed: I instantiated the model using the following command: ``` docker run --gpus all...

Should be more robust to shared tensors (ok when using `from_pretrained). But forcing us to add new checks in our loading code (since the chosen key to keep might be...

### System Info ## Problem Using the 0.8 (0.8.2) container with `--model-id tiiuae/falcon-40b-instruct --num-shard 2` on runpod.io with 2xA100 80GB On startup it starts loading the 2 shards but they...

Stale

### Feature request I was wondering if there will be a support for the newly released [mpt-30b-instruct](https://huggingface.co/mosaicml/mpt-30b-instruct) ### Motivation It's not possible to use `mosaicml/mpt-30b-instruct` model: `ValueError: sharded is not...