text-generation-inference
text-generation-inference copied to clipboard
Large Language Model Text Generation Inference
### Feature request It seems we now have support for loading models using 4bit quantization starting from bitsandbytes>=0.39.0 Link: [FP4 Quantization](https://huggingface.co/docs/transformers/main_classes/quantization#fp4-quantization) ### Motivation Running really large language models on smaller...
### System Info I'am on a Ubuntu server of https://console.paperspace.com/ with this GPU : | NVIDIA-SMI 515.105.01 Driver Version: 515.105.01 CUDA Version: 11.7 | |-------------------------------+----------------------+----------------------+ | GPU Name Persistence-M| Bus-Id...
### Feature request GPTQ is not supported yet in current version. Is there any timeline on that? ### Motivation quantization ### Your contribution can help with code if needed
### Feature request Hi, I was able to deploy the base Falcon-40B model to SageMaker using the TGI DLC by following [this blog post](https://www.philschmid.de/sagemaker-falcon-llm) I also recently fine-tuned the Falcon-40B...
What do you think @Narsil? Maybe we can hide behind a cargo feature, but then it's a bit of mess in the docker container. We will need to build multiple...
Currently, I am running Falcon quantized on 4 X Nvidia T4 GPUs, all running on the same system. I am getting `time_per_token` during inference of around 190 ms. Below is...
### System Info 2023-06-15T16:56:34.095240Z INFO text_generation_launcher: Runtime environment: Target: x86_64-unknown-linux-gnu Cargo version: 1.69.0 Commit sha: e7248fe90e27c7c8e39dd4cac5874eb9f96ab182 Docker label: sha-e7248fe nvidia-smi: Thu Jun 15 16:56:34 2023 +-----------------------------------------------------------------------------+ | NVIDIA-SMI 515.105.01 Driver...
### System Info ``` 2023-06-15T04:27:53.010592Z INFO text_generation_launcher: Runtime environment: [30/661] Target: x86_64-unknown-linux-gnu Cargo version: 1.69.0 Commit sha: 5ce89059f8149eaf313c63e9ded4199670cd74bb Docker label: sha-5ce8905 nvidia-smi: Thu Jun 15 04:27:51 2023 +-----------------------------------------------------------------------------+ | NVIDIA-SMI...
# What does this PR do? Fixes https://github.com/huggingface/text-generation-inference/issues/420 Fixes # (issue) ## Before submitting - [ ] This PR fixes a typo or improves the docs (you can dismiss the...
### Feature request When users save their model with `Trainer.save_model()` or `Trainer.push_to_hub()`, the training arguments are saved as a pickle file called `training_arguments.bin`. This causes a problem during the `safetensors`...