lorax issues

Expose env to set base path for local adpaters

1

### Feature request/question Expose ENV/flag in `lorax-server` and `lorax-launcher` to set base path of adapter during inference. We currently tried to do a workaround by setting HUGGINGFACE_HUB_CACHE=/home/adapters . With reference...

bjornjee

enhancement

good first issue

Support multiple ranks per SGMV op

Currently we support multiple ranks per batch via a loop, but this reduces batching effect and makes the process infeasible for CUDA graphs. Instead, we can pad our the buffers...

tgaddair

enhancement

Error while running the pre-built container using Podman

11

**System Info:** Python - 3.11.5 Cuda - 12.2 GPU: A100, Driver Version: 535.104.05 #GPU - 2 **Command used** model=mistralai/Mistral-7B-Instruct-v0.1 volume=$PWD/data sudo podman run --gpus all --shm-size 1g -p 8080:80 -v...

chaser06

question

Add support for control vector adapters per request

Good thread on it here: https://www.reddit.com/r/LocalLLaMA/comments/1bgej75/control_vectors_added_to_llamacpp/ Given how parameter efficient control vectors are, they're a perfect candidate for something like LoRAX where you might want to serve many different such...

tgaddair

enhancement

Issue using adapter with large prompt + sharded

5

Following error occurred at request time: ``` CUDA error: an illegal memory access was encountered ``` Repro context: - Mixtral-8x7b - Adapter (rank 8) - Long prompt - Sharded (2+...

tgaddair

bug

Include total time to generate tokens in final payload details

### Feature request When streaming a prompt response, the last message does not include the time to process the request. Would like to request that we include that information in...

martindavis

enhancement

good first issue

Invoke API based external models

1

Hi - How do I call OpenAI GPT (say gpt-4) and Google Gemini models through LoRAX? An example code snippet would really help. Thanks, Sekhar H.

sekhar-hari

question

decapoda-research/llama-13b-hf is not a local folder and is not a valid model identifier listed on 'https://huggingface.co/models'

7

Here is my code: ``` model=/data/vicuna-13b/vicuna-13b-v1.5/ docker run --gpus all --shm-size 1g -p 8080:80 -v /data/:/data \ ghcr.io/predibase/lorax:latest --model-id $model --sharded true --num-shard 2 \ --adapter-id baruga/alpaca-lora-13b ``` Here is...

sleepwalker2017

question

Fp6 quant from deepspeed

2

### Feature request https://github.com/huggingface/text-generation-inference/issues/1633 ### Motivation Throughout and latency ### Your contribution @tgaddair what do you think?

flozi00

enhancement

About the DoRA weights inference

1

### Feature request DoRA introduces a bigger overhead than pure LoRA, so it is recommended to merge weights for inference, see https://github.com/huggingface/peft/blob/main/docs/source/developer_guides/lora.md#weight-decomposed-low-rank-adaptation-dora, it seems that this method will break current...

thincal

enhancement

lorax
lorax copied to clipboard

Metadata

Expose env to set base path for local adpaters

Support multiple ranks per SGMV op

Error while running the pre-built container using Podman

Add support for control vector adapters per request

Issue using adapter with large prompt + sharded

Include total time to generate tokens in final payload details

Invoke API based external models

decapoda-research/llama-13b-hf is not a local folder and is not a valid model identifier listed on 'https://huggingface.co/models'

Fp6 quant from deepspeed

About the DoRA weights inference

← Metadata

Owner

Metadata

lorax lorax copied to clipboard

Metadata

← Metadata

Owner

Metadata

lorax
lorax copied to clipboard