lorax
lorax copied to clipboard
Multi-LoRA inference server that scales to 1000s of fine-tuned LLMs
### System Info helm=`v3.14.3` kubernetes=`v1.28.7` flux=`v2.2.3` ```c ### Information - [ ] Docker - [ ] The CLI directly ### Tasks - [X] An officially supported command - [ ]...
### System Info latest lorax ### Information - [ ] Docker - [ ] The CLI directly ### Tasks - [ ] An officially supported command - [ ] My...
### System Info Lorax: v0.8.1 Latest Docker Build: ```bash docker pull ghcr.io/predibase/lorax:sha256-b0464d97dc19bb5791769e13e45eac120a7d6f8777e9d8a6bb290602926e3ff8 ``` It appears that when attempt to allow CORS from any source "*" the lorax launcher incorrectly processes...
### Feature request We noticed that [collect_lora_a()](https://github.com/predibase/lorax/blob/934922d1710bb3f8dd656e55ff1bd3941a7071bd/server/lorax_server/utils/layers.py#L632) is calling all_gather and all_reduce every time. Do you think you could give a more efficient implementation of this soon? If not, could...
### System Info Nvidia GPU A100*8 Linux OS ``` ❯ /usr/local/cuda/bin/nvcc --version nvcc: NVIDIA (R) Cuda compiler driver Copyright (c) 2005-2023 NVIDIA Corporation Built on Tue_Aug_15_22:02:13_PDT_2023 Cuda compilation tools, release...
For example, constraining the model to output a boolean, rather than a boolean embedded inside a JSON object. Should be possible with a small modification to the existing Outlines integration.
### Feature request We should add the above param to docs so people know how to use it! ### Motivation Docs are great ### Your contribution I can ask the...
really cool project! im wondering how its different from s-Lora? https://github.com/S-LoRA/S-LoRA
aqlm
# What does this PR do? Fixes # (issue) ## Before submitting - [ ] This PR fixes a typo or improves the docs (you can dismiss the other checks...
Hi, I'm benchmarking lora-x on 2*A30. I get the poor performance, is that normal? The first sheet, I send requests for base model, and the batch means the number of...