text-generation-inference
text-generation-inference copied to clipboard
Large Language Model Text Generation Inference
### System Info Any environment, as long as you the TGI container with user != 0. ### Information - [x] Docker - [ ] The CLI directly ### Tasks -...
# What does this PR do? This is an investigation as to why using `"return_full_text": true` as a parameter in hitting the `/generate` endpoint produces a valid translation while running...
# What does this PR do? Fixes # (issue) ## Before submitting - [ ] This PR fixes a typo or improves the docs (you can dismiss the other checks...
### System Info `make install-server` does not have Apple MacOS Metal Framework - Please either remove from the readme info about brew/macOS altogether to not confuse users. - OR add...
### System Info text generation inference api ### Information - [ ] Docker - [ ] The CLI directly ### Tasks - [X] An officially supported command - [ ]...
### System Info I used 3.0.2 official docker to load a local Llama 3 instruct model ### Information - [x] Docker - [ ] The CLI directly ### Tasks -...
Pulling from https://github.com/huggingface/optimum-neuron/pull/776 If a model is cached with a different configuration, I want to display alternative options to the user. If someone copies from the deploy code on Hugging...
### Feature request The Prometheus builder, use port 9000 by default. This can be changed with e.g `builder.with_http_listener(([0, 0, 0, 0], 9001))`, however TGI doesn't support such configuration. Please implement...
Has anyone successfully installed TGI on a RISC-V architecture machine?