lorax
lorax copied to clipboard
Multi-LoRA inference server that scales to 1000s of fine-tuned LLMs
### Feature request Use something like streamlit to run a UI that can be used to query the deployments ### Motivation Would be a fun addition and allows people to...
### System Info I have used the following [guide](https://medium.com/@joaopcmoura/lora-serving-on-amazon-sagemaker-serve-100s-of-fine-tuned-llms-for-the-price-of-1-85034ef889c5) to deploy lorax to sagemaker. I am able to do so successfully using the unquantized models. Have deployed OpenHermes 2.5 successfully....
Using `--gpus all` for docker run also requires `--sharded` or `--gpus N` to be set for LoRAX, but this isn't made clear. We should add something in the docs about...
### System Info predibase ### Information - [ ] Docker - [ ] The CLI directly ### Tasks - [ ] An officially supported command - [ ] My own...
### System Info I run your docker image in 2 cases: - single gpu (`--sharded false`) - multi-gpu (`--sharded false --num_shard 4`) => When I run single-gpu, the total time...
See https://flashinfer.ai/2024/01/08/cascade-inference.html
### System Info I've run into 2 unexpected issues/inconsistencies when downloading adapters from S3. Issue 1: With `PREDIBASE_ADAPTERS_BUCKET=sagemaker-us-east-1-000000000000` Several prefixes with naming `lorax/mistral-adapters/{id}`, with id being an integer from 1...
### System Info Latest Lorax version ### Information - [X] Docker - [ ] The CLI directly ### Tasks - [X] An officially supported command - [ ] My own...
### Model description hi, my company has trained a model of 7b, we want to deploy lorax with our model. Can you introduce key steps to support model in loraX?...
Added a list of the exported metrics to the readme. Further info would be nice to add to the table - such as the metric type and the description.