text-generation-inference icon indicating copy to clipboard operation
text-generation-inference copied to clipboard

Large Language Model Text Generation Inference

Results 639 text-generation-inference issues
Sort by recently updated
recently updated
newest added

It seems to work fine and loads 4-10x faster for me depending on the storage/page cache (non-sharded 20B parameter model). However, when loaded this way, inference appears to be 10-15%...

Benefits: - Centralizes this logic that's on the critical inference loop path and does it in rust instead of python - Simplifies python side of the code, decoupling next-token generation...

### Feature request In addition to per request metrics such `tgi_request_count` it would be useful to have utilization metrics returned on a per interval basis similar to the way Triton...

### Feature request Improved `README.md` for the benchmarking utility that explains the different command line arguments. ### Motivation The benchmarking tool is awesome, I would just like to have some...

### System Info ```bash docker run --gpus all --shm-size 1g -p 8080:80 -v $PWD/data:/data ghcr.io/huggingface/text-generation-inference:0.8 --model-id tiiuae/falcon-7b --num-shard 2 ``` ``` ubuntu@ip-172-31-35-173:~$ nvidia-smi Wed Jun 7 09:30:53 2023 +-----------------------------------------------------------------------------+ |...

### System Info ``` 2023-06-07T08:37:39.808440Z INFO text_generation_launcher: Runtime environment: Target: x86_64-unknown-linux-gnu Cargo version: 1.69.0 Commit sha: 19c41824cb11ba1a3b60a2a65274d8c074383de3 Docker label: N/A nvidia-smi: Wed Jun 7 17:37:39 2023 +-----------------------------------------------------------------------------+ | NVIDIA-SMI 515.65.07...

### System Info running on single a100 with 16c and 128g ram ### Information - [X] Docker - [ ] The CLI directly ### Tasks - [ ] An officially...

### System Info max_total_tokens is hardcoded to 1512 and cant be changed from SageMaker. ### Information - [ ] Docker - [ ] The CLI directly ### Tasks - [...

### System Info I understand that you are experiencing the following error: vbnet Copy code Server error: Expected (head_size % 8 == 0) && (head_size