lorax
lorax copied to clipboard
Multi-LoRA inference server that scales to 1000s of fine-tuned LLMs
marlin
# What does this PR do? Fixes # (issue) ## Before submitting - [ ] This PR fixes a typo or improves the docs (you can dismiss the other checks...
### Feature request **Description** I want to make it easier for new people to use Lorax, especially those coming from other tools. Right now, they have to set max-input-length and...
marlin
### Feature request https://github.com/IST-DASLab/marlin ### Motivation faster inference ### Your contribution will open an PR tomorrow, opening this issue for tracking or if someone else is faster
Download process [here](https://github.com/predibase/lorax/blob/main/launcher/src/main.rs#L800) will be essentially a no-op if the model weights are already present, but this can add several seconds of latency to startup. We can make a quick...
LongLM
### Feature request https://arxiv.org/pdf/2401.01325.pdf Abstract This work elicits LLMs’ inherent ability to handle long contexts without fine-tuning. The limited length of the training sequence during training may limit the application...
### System Info 2024-01-10T09:14:20.356771Z INFO lorax_launcher: Args { model_id: "/data/Llama-2-7b-chat-hf", adapter_id: "/data/llama2-lora", source: "hub", adapter_source: "hub", revision: None, validation_workers: 2, sharded: None, num_shard: None, quantize: None, compile: false, dtype: None,...
### Feature request The developments in the robotics community around RT-2 show a lot of potential for VLMs but the hardware constraints for small developers makes it difficult to deploy...
### Feature request I have download the model, so I want to run it use local model, eht sample is: docker run --gpus all --shm-size 1g -p 8080:80 -v /data/model/:/data/...
### System Info Lorax version: 0.4.1 Lorax_launcher: 0.1.0 Model: mistralai/Mixtral-8x7B-Instruct-v0.1 GPUS: 3090 (24 gb) 3060 (12 gb) ### Information - [X] Docker - [ ] The CLI directly ### Tasks...