text-generation-inference issues

"Unauthorized for url: https://huggingface.co/api/models/bigcode/starcoder"

3

### System Info The API is working with the Bloom 560M But when i try the model "bigcode/starcoder" i got this error : ``` 2023-06-08T09:42:38.813488Z ERROR text_generation_launcher: Download encountered an...

ArnaudHureaux

FlashLlama doesn't look for safetensors files

1

### Feature request Add safetensors support to FlashLlama. ### Motivation On server startup, bin files are automatically converted to safetensors format. But the FlashLlama class currently only looks for bin...

0x1997

Inference support for GPTQ (llama + falcon tested) + Quantization script

37

Let's start discussing implementation. - Need to expose the quantization scripts (either included here or add doc on how to use https://github.com/qwopqwop200/GPTQ-for-LLaMa) - Make sure GPTQ works for multiple models...

Narsil

CUDA_VISIBLE_DEVICES

3

### System Info **Target:** x86_64-unknown-linux-gnu **Cargo version:** 1.69.0 **Docker label:** N/A **CUDA driver version:** 12000 **CUDA bare metal version:** 11.8 **Pytorch CUDA version:** 2.0.1+cu118 **System's CUDA version:** 11.8 **GPU supported...

antferdom

Request a new api endpoint to check and retrieve token length for given text/prompt

13

### Feature request Given that LLM models have a max token length. It would be better to have an API that can be invoked that checks the token length to...

Perseus14

Stale

Python wrapper for text-generation-benchmark utility

3

### Feature request Control the benchmarking utility output, format and location, for running the benchmarking using system Python wrappers as `subprocess` or `pexpect`. Writing the output to a file might...

antferdom

Stale

Client validation error when server generating <unk> token

7

### System Info text-generation-inference version: v0.8.2 text-generation version (python client): 0.6.0 gpu: nvidia A100 40G text-generation-launcher env: ```console > text-generation-launcher --env 2023-06-12T09:13:08.093676Z INFO text_generation_launcher: Runtime environment: Target: x86_64-unknown-linux-gnu Cargo version:...

edwardzjl

Makefile typo and non posix

1

### System Info Not needed, bug explained ### Information - [ ] Docker - [X] The CLI directly ### Tasks - [X] An officially supported command - [ ] My...

piratos

How to solve "Model is overloaded" when sending 500 requests?

2

### System Info ``` 2023-06-12T09:06:08.879916Z INFO text_generation_launcher: Runtime environment: Target: x86_64-unknown-linux-gnu Cargo version: 1.69.0 Commit sha: abd58ff82c37d5e4f131abdac3d298927a815604 Docker label: N/A nvidia-smi: Mon Jun 12 18:06:07 2023 +-----------------------------------------------------------------------------+ | NVIDIA-SMI 515.65.07...

jshin49

Using a model of type RefinedWeb to instantiate a model of type .

10

### System Info Running huggingface/text-generation-inference:0.8.2 on a kubernetes cluster. ``` 2023-06-13T15:28:49.039767Z INFO text_generation_launcher: Runtime environment: Target: x86_64-unknown-linux-gnu Cargo version: 1.69.0 Commit sha: e7248fe90e27c7c8e39dd4cac5874eb9f96ab182 Docker label: sha-e7248fe nvidia-smi: Tue Jun 13...

egeucak

Stale

text-generation-inference
text-generation-inference copied to clipboard

Metadata

"Unauthorized for url: https://huggingface.co/api/models/bigcode/starcoder"

FlashLlama doesn't look for safetensors files

Inference support for GPTQ (llama + falcon tested) + Quantization script

CUDA_VISIBLE_DEVICES

Request a new api endpoint to check and retrieve token length for given text/prompt

Python wrapper for text-generation-benchmark utility

Client validation error when server generating <unk> token

Makefile typo and non posix

How to solve "Model is overloaded" when sending 500 requests?

Using a model of type RefinedWeb to instantiate a model of type .

← Metadata

Owner

Metadata

text-generation-inference text-generation-inference copied to clipboard

Metadata

← Metadata

Owner

Metadata

text-generation-inference
text-generation-inference copied to clipboard