text-generation-inference icon indicating copy to clipboard operation
text-generation-inference copied to clipboard

Large Language Model Text Generation Inference

Results 639 text-generation-inference issues
Sort by recently updated
recently updated
newest added

# What does this PR do? Include `/health` into OpenAPI doc ## Before submitting - [x] This PR fixes a typo or improves the docs (you can dismiss the other...

### Model description Can you add a new MPT model? This looks very promising, especially the ability to extend context length by up to 85K tokens. ### Open source status...

Similar to the work performed [langchain-llm-api](https://github.com/1b5d/langchain-llm-api) I would like to see the ability to use this natively within langchain. Are there any plans to do so such that the models...

Stale

### Feature request Hello! It would be awesome to have LLaVa support (upload an image to the API and have it embed it via CLIP etc) https://github.com/haotian-liu/LLaVA text-generation-webui already has...

Stale

I try to start a large version of the model using docker: `docker run -p 10249:80 -e RUST_BACKTRACE=full -e FLASH_ATTENTION=1 -e CUDA_VISIBLE_DEVICES=4,7 --privileged --security-opt="seccomp=unconfined" -v /download:/data ghcr.io/huggingface/text-generation-inference:0.5 --model-id /data/llama-13b-hf --num-shard...

bug

Are there any plans to support logit processors via an additional API parameter? For instance, the OpenAI API provides a `logit_bias` parameter, that is applied to the token distribution during...

Stale

Hi, I'm trying to deploy bert-base-uncased model by [v0.5.0](https://github.com/huggingface/text-generation-inference/tree/v0.5.0), but got an error: ValueError: BertLMHeadModel does not support `device_map='auto'` yet. ``` root@nick-test1-8zjwg-135105-worker-0:/usr/local/bin# ./text-generation-launcher --model-id bert-base-uncased 2023-04-14T07:24:23.167920Z INFO text_generation_launcher: Args {...

question

HF has contrastive sampling, ideally we could use that to sample here too.

enhancement

Currently, position_ids are always maintained/updated in the CausalLM case but this is unnecessary for models like BLOOM which don't use them.