lorax
lorax copied to clipboard
Multi-LoRA inference server that scales to 1000s of fine-tuned LLMs
# What does this PR do? Fixes # (issue) ## Before submitting - [ ] This PR fixes a typo or improves the docs (you can dismiss the other checks...
### Feature request Lorax should have a feature where It could generated multiple generation request for same prompt via a single API call. ### Motivation It is helpful as when...
### System Info amazon linux 2 Running it in l40s ### Information - [X] Docker - [ ] The CLI directly ### Tasks - [X] An officially supported command -...
### System Info AWS EC2 G6e.xlarge instance (1 L40S GPU), Linux Machine Latest lorax version as of today, using the docker command Python 3.12.6 PyTorch version: 2.4.0+cu121 CUDA version: 12.1...
### System Info `ghcr.io/predibase/lorax:f1ef0ee` ### Information - [X] Docker - [ ] The CLI directly ### Tasks - [X] An officially supported command - [ ] My own modifications ###...
# What does this PR do? Fixes # (issue) ## Before submitting - [ ] This PR fixes a typo or improves the docs (you can dismiss the other checks...
# What does this PR do? Fixes # (issue) ## Before submitting - [ ] This PR fixes a typo or improves the docs (you can dismiss the other checks...
### System Info OS version: Ubuntu 22.04 Rust version (if self-compiling, cargo version): Cargo 1.75.0 Model being used (curl 127.0.0.1:8080/info | jq): If local model please explicit the kind of...
### System Info Hi, I'm currently working on using Lorax to test, benchmark, and serve my LoRA adapters, but I keep facing the same problem. The issue is that after...
## Problem: Addressing "Frozen Pain" in On-Premise LoRAX Adoption This Pull Request introduces a comprehensive **LoRAX Deployment Playbook** designed to drastically improve the on-premise adoption experience, directly addressing documented pain...