text-generation-inference icon indicating copy to clipboard operation
text-generation-inference copied to clipboard

Large Language Model Text Generation Inference

Results 639 text-generation-inference issues
Sort by recently updated
recently updated
newest added

### System Info Trying to access the serverless inference endpoints using the OpenAI compatible route leads to status 400. ``` Invalid URL: missing field `name` ``` ### Information - [...

### System Info We are running the tgi container and a fastapi app that queries the model. I will refer to them as "tgi" and "llm-api". Both docker containers are...

### System Info #### versions: - text-generation-inference: latest docker image - os: Debian GNU/Linux 11 - model: llava-hf/llava-v1.6-mistral-7b-hf ### Information - [x] Docker - [ ] The CLI directly ###...

In other inference APIs, `response_format={"type": "json_object"}` restricts the model output to be a valid JSON object without enforcing a schema. Right now this is not supported: ``` Failed to deserialize...

### System Info Tested with text-generation-inference 2.4.0 and 3.0.0 Docker containers running the CLI from within on Sagemaker Real-time Inference (NVIDIA driver 535.216.01) ### Information - [x] Docker - [x]...

### Feature request TGI should read the config.json and apply the rope scaling and factor from the config.json parameter. ### Motivation Many inference engines auto-apply the rope scaling and rope...

### Model description **[MiniCPM-o-2_6](https://github.com/OpenBMB/MiniCPM-o)** is the latest and most capable model in the MiniCPM-o series. The model is built in an end-to-end fashion based on SigLip-400M, Whisper-medium-300M, ChatTTS-200M, and Qwen2.5-7B...

### Feature request Do you plan on integrating dynamic serving of LoRA modules, so that new modules can be added / removed during runtime instead of having to restart the...

### Feature request Support the use of XGrammar instead of Outlines for the backend Structured-Output generation. ### Motivation XGrammar has been shown to be much faster than Outlines for generation...