text-generation-inference issues

The HF_TRANSFER is not working for the model CalderaAI/30B-Lazarus

1

### System Info I'am on a Ubuntu server of https://console.paperspace.com/ with this 2 A100 GPU, but when i run the model CalderaAI/30B-Lazarus i don't cannot use the HF Transfer, even...

ArnaudHureaux

Async and Sync results in different generation

6

### System Info ``` 2023-06-15T04:27:53.010592Z INFO text_generation_launcher: Runtime environment: [30/661] Target: x86_64-unknown-linux-gnu Cargo version: 1.69.0 Commit sha: 5ce89059f8149eaf313c63e9ded4199670cd74bb Docker label: sha-5ce8905 nvidia-smi: Thu Jun 15 04:27:51 2023 +-----------------------------------------------------------------------------+ | NVIDIA-SMI...

jshin49

Any chance to support replit-code-v1-3b?

1

### Model description https://huggingface.co/replit/replit-code-v1-3b replit-code-v1-3b is a 2.7B Causal Language Model focused on Code Completion. The model has been trained on a subset of the [Stack Dedup v1.2 dataset](https://arxiv.org/abs/2211.15533). ###...

luckyfish0826

SPQR discussion

4

### Feature request https://github.com/Vahe1994/SpQR @TimDettmers already published on Twitter, the new 3.35 bit per parameter technique by SPQR I am watching the progress and in the near time they want...

flozi00

Stale

feat(server): improve flash attention import errors

@lewtun, is this enough? Closes #458 Closes #456

OlivierDehaene

Falcon-40b-instruct deployment in SageMaker fails when using serial inference pipeline

4

### System Info Hi, Deployment of Falcon-40b-instruct as SageMaker endpoint worked well for me when following [this](https://www.philschmid.de/sagemaker-falcon-llm) tutorial. However when I try to deploy the container as part of [serial...

omrigut1

.bin weights not found for model

7

### System Info i get this error after running this on docker https://huggingface.co/huggingface/falcon-40b-gptq?text=My+name+is+Lewis+and+I+like+to huggingface_hub.utils._errors.EntryNotFoundError: No .bin weights found for model huggingface/falcon-40b-gptq and revision None. ### Information - [X] Docker -...

mayurtikundi12

curious about the plans for supporting PEFT and LoRa.

9

### Feature request I need to be able to apply lora adapter to local llm ### Motivation lora is a good tool to lightly go through and check your current...

kissngg

Print error logs from launcher during integration tests

5

# What does this PR do? Print error logs from launcher during integration tests ## Before submitting - [ ] This PR fixes a typo or improves the docs (you...

Atry

[Documentation] Unclear how to use other architectures

8

In your readme you list optimised arch and say > Other architectures are supported on a best effort basis using: >AutoModelForCausalLM.from_pretrained(, device_map="auto") >or >AutoModelForSeq2SeqLM.from_pretrained(, device_map="auto") Can you explain where we...

louis030195

text-generation-inference
text-generation-inference copied to clipboard

Metadata

The HF_TRANSFER is not working for the model CalderaAI/30B-Lazarus

Async and Sync results in different generation

Any chance to support replit-code-v1-3b?

SPQR discussion

feat(server): improve flash attention import errors

Falcon-40b-instruct deployment in SageMaker fails when using serial inference pipeline

.bin weights not found for model

curious about the plans for supporting PEFT and LoRa.

Print error logs from launcher during integration tests

[Documentation] Unclear how to use other architectures

← Metadata

Owner

Metadata

text-generation-inference text-generation-inference copied to clipboard

Metadata

← Metadata

Owner

Metadata

text-generation-inference
text-generation-inference copied to clipboard