text-generation-inference icon indicating copy to clipboard operation
text-generation-inference copied to clipboard

Large Language Model Text Generation Inference

Results 639 text-generation-inference issues
Sort by recently updated
recently updated
newest added

### System Info I'am on a Ubuntu server of https://console.paperspace.com/ with this 2 A100 GPU, but when i run the model CalderaAI/30B-Lazarus i don't cannot use the HF Transfer, even...

### System Info ``` 2023-06-15T04:27:53.010592Z INFO text_generation_launcher: Runtime environment: [30/661] Target: x86_64-unknown-linux-gnu Cargo version: 1.69.0 Commit sha: 5ce89059f8149eaf313c63e9ded4199670cd74bb Docker label: sha-5ce8905 nvidia-smi: Thu Jun 15 04:27:51 2023 +-----------------------------------------------------------------------------+ | NVIDIA-SMI...

### Model description https://huggingface.co/replit/replit-code-v1-3b replit-code-v1-3b is a 2.7B Causal Language Model focused on Code Completion. The model has been trained on a subset of the [Stack Dedup v1.2 dataset](https://arxiv.org/abs/2211.15533). ###...

### Feature request https://github.com/Vahe1994/SpQR @TimDettmers already published on Twitter, the new 3.35 bit per parameter technique by SPQR I am watching the progress and in the near time they want...

Stale

@lewtun, is this enough? Closes #458 Closes #456

### System Info Hi, Deployment of Falcon-40b-instruct as SageMaker endpoint worked well for me when following [this](https://www.philschmid.de/sagemaker-falcon-llm) tutorial. However when I try to deploy the container as part of [serial...

### System Info i get this error after running this on docker https://huggingface.co/huggingface/falcon-40b-gptq?text=My+name+is+Lewis+and+I+like+to huggingface_hub.utils._errors.EntryNotFoundError: No .bin weights found for model huggingface/falcon-40b-gptq and revision None. ### Information - [X] Docker -...

### Feature request I need to be able to apply lora adapter to local llm ### Motivation lora is a good tool to lightly go through and check your current...

# What does this PR do? Print error logs from launcher during integration tests ## Before submitting - [ ] This PR fixes a typo or improves the docs (you...

In your readme you list optimised arch and say > Other architectures are supported on a best effort basis using: >AutoModelForCausalLM.from_pretrained(, device_map="auto") >or >AutoModelForSeq2SeqLM.from_pretrained(, device_map="auto") Can you explain where we...