text-generation-inference
text-generation-inference copied to clipboard
Large Language Model Text Generation Inference
### System Info The API is working with the Bloom 560M But when i try the model "bigcode/starcoder" i got this error : ``` 2023-06-08T09:42:38.813488Z ERROR text_generation_launcher: Download encountered an...
### Feature request Add safetensors support to FlashLlama. ### Motivation On server startup, bin files are automatically converted to safetensors format. But the FlashLlama class currently only looks for bin...
Let's start discussing implementation. - Need to expose the quantization scripts (either included here or add doc on how to use https://github.com/qwopqwop200/GPTQ-for-LLaMa) - Make sure GPTQ works for multiple models...
### System Info **Target:** x86_64-unknown-linux-gnu **Cargo version:** 1.69.0 **Docker label:** N/A **CUDA driver version:** 12000 **CUDA bare metal version:** 11.8 **Pytorch CUDA version:** 2.0.1+cu118 **System's CUDA version:** 11.8 **GPU supported...
### Feature request Given that LLM models have a max token length. It would be better to have an API that can be invoked that checks the token length to...
### Feature request Control the benchmarking utility output, format and location, for running the benchmarking using system Python wrappers as `subprocess` or `pexpect`. Writing the output to a file might...
### System Info text-generation-inference version: v0.8.2 text-generation version (python client): 0.6.0 gpu: nvidia A100 40G text-generation-launcher env: ```console > text-generation-launcher --env 2023-06-12T09:13:08.093676Z INFO text_generation_launcher: Runtime environment: Target: x86_64-unknown-linux-gnu Cargo version:...
### System Info Not needed, bug explained ### Information - [ ] Docker - [X] The CLI directly ### Tasks - [X] An officially supported command - [ ] My...
### System Info ``` 2023-06-12T09:06:08.879916Z INFO text_generation_launcher: Runtime environment: Target: x86_64-unknown-linux-gnu Cargo version: 1.69.0 Commit sha: abd58ff82c37d5e4f131abdac3d298927a815604 Docker label: N/A nvidia-smi: Mon Jun 12 18:06:07 2023 +-----------------------------------------------------------------------------+ | NVIDIA-SMI 515.65.07...
### System Info Running huggingface/text-generation-inference:0.8.2 on a kubernetes cluster. ``` 2023-06-13T15:28:49.039767Z INFO text_generation_launcher: Runtime environment: Target: x86_64-unknown-linux-gnu Cargo version: 1.69.0 Commit sha: e7248fe90e27c7c8e39dd4cac5874eb9f96ab182 Docker label: sha-e7248fe nvidia-smi: Tue Jun 13...