text-generation-inference
text-generation-inference copied to clipboard
Large Language Model Text Generation Inference
### System Info Running docker image version 2.4.0 with eetq quantization Model: microsoft/Phi-3.5-mini-instruct ``` {"model_id":"microsoft/Phi-3.5-mini-instruct","model_sha":"af0dfb8029e8a74545d0736d30cb6b58d2f0f3f0","model_pipeline_tag":"text-generation","max_concurrent_requests":128,"max_best_of":2,"max_stop_sequences":4,"max_input_tokens":2048,"max_total_tokens":4096,"validation_workers":2,"max_client_batch_size":4,"router":"text-generation-router","version":"2.4.0","sha":"0a655a0ab5db15f08e45d8c535e263044b944190","docker_label":"sha-0a655a0"} ``` Hardware: Google Kubernetes engine, L4 GPU ``` +-----------------------------------------------------------------------------------------+ | NVIDIA-SMI 550.90.07 Driver Version: 550.90.07...
### System Info System: `Linux 4.18.0-553.22.1.el8_10.x86_64 #1 SMP Wed Sep 25 09:20:43 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux` `Rocky Linux 8.10` Hardware: GPU: `NVIDIA A100-SXM4-80GB` CPU: Architecture: x86_64 CPU op-mode(s):...
We are running LLaMa 3.1 70B on 2 A100 GPUs with 80GB of RAM each. From the logs we see that warmup phase succeeded finding the right `max_batch_total_tokens` and that...
### Model description Hi I'm interested in adding support for Falcon-Mamba 7B to TGI, Here are some links for this model: paper: https://arxiv.org/abs/2410.05355 model: https://huggingface.co/tiiuae/falcon-mamba-7b ### Open source status -...
### System Info Using prefix caching = True Using Attention = flashinfer WARNING 11-10 11:16:48 ray_utils.py:46] Failed to import Ray with ModuleNotFoundError("No module named 'ray'"). For distributed inference, please install...
### System Info Hi all, I was installing from source and I got this error: Building wheels for collected packages: vllm Building editable for vllm (pyproject.toml) ... error error: subprocess-exited-with-error...
### System Info ``` 2024-11-06T04:38:58.950145Z INFO text_generation_launcher: Runtime environment: Target: x86_64-unknown-linux-gnu Cargo version: 1.80.1 Commit sha: b1f9044d6cf082423a517cf9a6aa6e5ebd34e1c2 Docker label: sha-b1f9044 nvidia-smi: Wed Nov 6 04:38:58 2024 +-----------------------------------------------------------------------------------------+ | NVIDIA-SMI 560.35.03...
### Feature request Add a new configuration parameter, "bigram_repetition_penalty", to the Text Generation Inference module. This parameter will introduce a mechanism that penalizes repeated bigrams in generated text, similar to...
### System Info We are using a EC2 instance with T4 machine in AWS (g4dn.2xlarge) for deploying our fine-tuned model. ### Information - [X] Docker - [ ] The CLI...
--- ### System Info **TGI Versions**: - **2.4.0**: Deployment fails with the error: `ERROR text_generation_launcher: Error when initializing model` - **2.1.1**: Deployment succeeds, but CURL requests fail with the error:...