lorax issues

Getting error with LoRA adapters for Qwen 2.5 0.5B Instruct

### System Info **Docker Command:** ``` docker run --gpus all --shm-size 1g -p 80:80 -d -v /root/data:/data -e HUGGING_FACE_HUB_TOKEN='hf_###' -e MODEL_ID='${model_name}' -e TRUST_REMOTE_CODE='true' ghcr.io/predibase/lorax:main ``` **Hardware:** AWS g6.xlarge ``` +-----------------------------------------------------------------------------------------+...

iddogino

Quantization appears to be broken, at least for AWQ and BnB

5

### System Info I have tried the following Lorax versions: (official version) ghcr.io/predibase/lorax:0.12 (locally compiled) lorax:69bb989 CUDA: 12.4 12.6 ### Information - [X] Docker - [ ] The CLI directly...

codybum

Allow adapter loading for VLMs

This PR allows adapters to be loaded in for VLMs

Infernaught

Phi 3.5 vision (4B model)

4

### Model description Lorax's official supported models does not list any vision model. This is a big gap for a very successful product. Having lorax a critical component in our...

CheeseAndMeat

enhancement

Not able to preload local adapters

### System Info lorax 0.12.1 container ### Information - [X] Docker - [ ] The CLI directly ### Tasks - [X] An officially supported command - [ ] My own...

xyang16

int: Enable manual trigger for server tests wf

jeffreyftang

Attention not working properly in FlashRobertaModel and FlashBertModel

### System Info ### Operating System Distributor ID: Ubuntu Description: Ubuntu 20.04.6 LTS Release: 20.04 ### Hardware used ``` +-----------------------------------------------------------------------------------------+ | NVIDIA-SMI 560.35.03 Driver Version: 560.35.03 CUDA Version: 12.6 |...

sgiorgis

why prune in mergeStrategy no rescale remaining?

### Feature request I read the paper about Dare, in this paper author proposed adout rescale remaining and rescale has a significant impact on the fusion performance of multiple models....

zhujianwei-ops

Not able to host Llama3.2-11b on Azure A100 80GB server

2

### System Info lorax_version=0.12.0 Using Docker to host the 11b model it runs perfectly for Llama3.1-8b But with LLama3.2-11b I am getting the following error ModuleNotFoundError: No module named 'lorax_server.utils.attention.utils'...

alokgupta1996

Throughput and Latency degradation with a single LoRA adapter on A100 40 GB

1

### System Info --- **Setup Summary for LoRAX Benchmarking with Llama-2 Model:** - **Hardware**: A100 40 GB (a2-highgpu-2g) on Google Kubernetes Engine (GKE) - **Image**: ghcr.io/predibase/lorax:latest - **Model**: `meta-llama/Llama-2-7b-hf` -...

kaushikmitr

lorax
lorax copied to clipboard

Metadata

Getting error with LoRA adapters for Qwen 2.5 0.5B Instruct

Quantization appears to be broken, at least for AWQ and BnB

Allow adapter loading for VLMs

Phi 3.5 vision (4B model)

Not able to preload local adapters

int: Enable manual trigger for server tests wf

Attention not working properly in FlashRobertaModel and FlashBertModel

why prune in mergeStrategy no rescale remaining?

Not able to host Llama3.2-11b on Azure A100 80GB server

Throughput and Latency degradation with a single LoRA adapter on A100 40 GB

← Metadata

Owner

Metadata

lorax lorax copied to clipboard

Metadata

← Metadata

Owner

Metadata

lorax
lorax copied to clipboard