text-generation-inference icon indicating copy to clipboard operation
text-generation-inference copied to clipboard

Large Language Model Text Generation Inference

Results 639 text-generation-inference issues
Sort by recently updated
recently updated
newest added

I did some extensive investigation, testing and benchmarking, and determined that the following is needed to speedup inference for the Bigcode models (and most of text-gen-inference models: 1. **Use `FlashAttention...

### Feature request Enable the use of locally stored adapters as created by huggingface/peft. Ideally, this should be compatible with the most notable benefits of TGI (e.g. sharing and flash...

enhancement

#### Motivation Currently to avoid OOM you must set a "worst case" max batch size based on the desired max sequence length. This means that (a) throughput is unnecessarily limited...

### Feature request ``` pretrained_model_dir = 'mosaicml/mpt-7b' model = AutoModelForCausalLM.from_config(config, trust_remote_code=True, torch_dtype=torch.float16) ``` https://discuss.huggingface.co/t/how-to-use-trust-remote-code-true-with-load-checkpoint-and-dispatch/39849/1 ### Motivation model-specific params ### Your contribution sure

Prototype to greatly reduce the post-processing overhead at higher batch sizes.

# What does this PR do? Reworked the loading logic. Idea is to use cleaner loading code: - Remove need for `no_init_weights` - Remove all weird `bnb_linear` and `load_weights` and...

# What does this PR do? Fixes # (issue) ## Before submitting - [ ] This PR fixes a typo or improves the docs (you can dismiss the other checks...

### CI - [x] Add custom multi GPU runner to CI - [x] test docker image on MR - [x] load test on daily cron (low prio) ### server -...

### System Info text-generation-inference: v0.7.0 python: 3.9 Operation System: Ubuntu 18.04 When loading chatglm model use command: docker run --gpus '"device=3"' --shm-size 1g -p 8083:80 -v /data/llm:/data ghcr.io/huggingface/text-generation-inference:latest --model-id /data/chatglm-6b...

### System Info Using latest docker ### Information - [X] Docker - [ ] The CLI directly ### Tasks - [X] An officially supported command - [ ] My own...