lorax issues

Add helm chart to OCI repo

3

### System Info helm=`v3.14.3` kubernetes=`v1.28.7` flux=`v2.2.3` ```c ### Information - [ ] Docker - [ ] The CLI directly ### Tasks - [X] An officially supported command - [ ]...

NerdyShawn

enhancement

good first issue

Error: Warmup(Generation("Not enough memory to handle 1024 prefill tokens. You need to decrease `--max-batch-prefill-tokens`")

2

### System Info latest lorax ### Information - [ ] Docker - [ ] The CLI directly ### Tasks - [ ] An officially supported command - [ ] My...

KrisWongz

CORS allow origin doesn't support wildcard *

1

### System Info Lorax: v0.8.1 Latest Docker Build: ```bash docker pull ghcr.io/predibase/lorax:sha256-b0464d97dc19bb5791769e13e45eac120a7d6f8777e9d8a6bb290602926e3ff8 ``` It appears that when attempt to allow CORS from any source "*" the lorax launcher incorrectly processes...

OptimusLime

enhancement

Efficient implementation of all_reduce and all_gather for collect_lora_a

2

### Feature request We noticed that [collect_lora_a()](https://github.com/predibase/lorax/blob/934922d1710bb3f8dd656e55ff1bd3941a7071bd/server/lorax_server/utils/layers.py#L632) is calling all_gather and all_reduce every time. Do you think you could give a more efficient implementation of this soon? If not, could...

hayleyhu

enhancement

Sample command with mistral-7b failed

10

### System Info Nvidia GPU A100*8 Linux OS ``` ❯ /usr/local/cuda/bin/nvcc --version nvcc: NVIDIA (R) Cuda compiler driver Copyright (c) 2005-2023 NVIDIA Corporation Built on Tue_Aug_15_22:02:13_PDT_2023 Cuda compilation tools, release...

hayleyhu

question

Support constrained generation of valid Python types

For example, constraining the model to output a boolean, rather than a boolean embedded inside a JSON object. Should be possible with a small modification to the existing Outlines integration.

jeffreyftang

enhancement

Add in "--adapter-memory-fraction" to docs

### Feature request We should add the above param to docs so people know how to use it! ### Motivation Docs are great ### Your contribution I can ask the...

noah-yoshida

documentation

good first issue

how does this differ from s-Lora?

9

really cool project! im wondering how its different from s-Lora? https://github.com/S-LoRA/S-LoRA

priyankat99

question

aqlm

2

# What does this PR do? Fixes # (issue) ## Before submitting - [ ] This PR fixes a typo or improves the docs (you can dismiss the other checks...

flozi00

performance issue

4

Hi, I'm benchmarking lora-x on 2*A30. I get the poor performance, is that normal? The first sheet, I send requests for base model, and the batch means the number of...

sleepwalker2017

question

lorax
lorax copied to clipboard

Metadata

Add helm chart to OCI repo

Error: Warmup(Generation("Not enough memory to handle 1024 prefill tokens. You need to decrease `--max-batch-prefill-tokens`")

CORS allow origin doesn't support wildcard *

Efficient implementation of all_reduce and all_gather for collect_lora_a

Sample command with mistral-7b failed

Support constrained generation of valid Python types

Add in "--adapter-memory-fraction" to docs

how does this differ from s-Lora?

aqlm

performance issue

← Metadata

Owner

Metadata

lorax lorax copied to clipboard

Metadata

← Metadata

Owner

Metadata

lorax
lorax copied to clipboard