lorax issues

Add a small GUI to interact with models easier

### Feature request Use something like streamlit to run a UI that can be used to query the deployments ### Motivation Would be a fun addition and allows people to...

magdyksaleh

enhancement

Error deploying GPTQ models to sagemaker

5

### System Info I have used the following [guide](https://medium.com/@joaopcmoura/lora-serving-on-amazon-sagemaker-serve-100s-of-fine-tuned-llms-for-the-price-of-1-85034ef889c5) to deploy lorax to sagemaker. I am able to do so successfully using the unquantized models. Have deployed OpenHermes 2.5 successfully....

GlacierPurpleBison

docs: Clarify multi-gpu usage

1

Using `--gpus all` for docker run also requires `--sharded` or `--gpus N` to be set for LoRAX, but this isn't made clear. We should add something in the docs about...

tgaddair

documentation

sync.sh script fails for some models (Llama-2-70b being one of them)

### System Info predibase ### Information - [ ] Docker - [ ] The CLI directly ### Tasks - [ ] An officially supported command - [ ] My own...

noah-yoshida

Latency increase when run on multi-GPU

5

### System Info I run your docker image in 2 cases: - single gpu (`--sharded false`) - multi-gpu (`--sharded false --num_shard 4`) => When I run single-gpu, the total time...

prd-tuong-nguyen

question

FlashInfer integration + cascade inference (prefix caching)

1

See https://flashinfer.ai/2024/01/08/cascade-inference.html

tgaddair

enhancement

S3 download issues

2

### System Info I've run into 2 unexpected issues/inconsistencies when downloading adapters from S3. Issue 1: With `PREDIBASE_ADAPTERS_BUCKET=sagemaker-us-east-1-000000000000` Several prefixes with naming `lorax/mistral-adapters/{id}`, with id being an integer from 1...

joaopcm1996

bug

Mixtral nf4 performance 2x slower than expected

2

### System Info Latest Lorax version ### Information - [X] Docker - [ ] The CLI directly ### Tasks - [X] An officially supported command - [ ] My own...

timohear

question

Support self-trained model

2

### Model description hi, my company has trained a model of 7b， we want to deploy lorax with our model. Can you introduce key steps to support model in loraX？...

yangelaboy

question

Added list of exported metrics

Added a list of the exported metrics to the readme. Further info would be nice to add to the table - such as the metric type and the description.

noah-yoshida

lorax
lorax copied to clipboard

Metadata

Add a small GUI to interact with models easier

Error deploying GPTQ models to sagemaker

docs: Clarify multi-gpu usage

sync.sh script fails for some models (Llama-2-70b being one of them)

Latency increase when run on multi-GPU

FlashInfer integration + cascade inference (prefix caching)

S3 download issues

Mixtral nf4 performance 2x slower than expected

Support self-trained model

Added list of exported metrics

← Metadata

Owner

Metadata

lorax lorax copied to clipboard

Metadata

← Metadata

Owner

Metadata

lorax
lorax copied to clipboard