lorax
lorax copied to clipboard
Multi-LoRA inference server that scales to 1000s of fine-tuned LLMs
### Feature request Recent mistral models inlcuding mistral 7b v0.3 instruct have consolidated.safetensors which have different weights key names compared to what LoRAx expects. Also there are keys like lm_head,...
if LoRAX is based on punica kernels will it be able to support LoRA Adapters for Mistral NeMO 12B?
### Feature request if LoRAX is based on punica kernels will it be able to support LoRA Adapters for Mistral NeMO 12B? which has a vocab size > 130k. Currently...
### System Info ghcr.io/predibase/lorax:24cb494 ### Information - [X] Docker - [ ] The CLI directly ### Tasks - [X] An officially supported command - [ ] My own modifications ###...
### System Info When using predibase serverless I see stop words included in the stream. I assumed it is supposed to stop and not include them ### Information - [...
### System Info We are using streaming v1 chat completions API. After some amount of requests or a request with large enough context lorax server fails to respond. And all...
This should prevent some nasty illegal memory access errors 1. Consolidate individual list comprehensions into a single for loop 2. Distinct code to create the lora weight pointers tensor 3....
### System Info I am trying to run a qwen2-7b-instruct with AWQ quantized in a kubernetes environment. GPU is single T4 (16 GB VRAM). I see that it is unable...
Previously, when loading a base model from s3: `--source s3 --model-id s3://bucket/model` The model would be downloaded to the cache path `/data/models--model`. However, when the base model is first loaded,...
# What does this PR do? 1. Re-organize the code in BatchLoraWeights.load. This function was a bit hard to understand as there were multiple list comprehensions with almost same looping...
### Model description OpenAI whisper model eg. medium.en ### Open source status - [x] The model implementation is available - [X] The model weights are available ### Provide useful links...