lorax
lorax copied to clipboard
Multi-LoRA inference server that scales to 1000s of fine-tuned LLMs
### System Info lorax latest docker, 2 A100, unbuntu 22.04 ### Information - [X] Docker - [ ] The CLI directly ### Tasks - [ ] An officially supported command...
### Feature request Need support for loading models that only contain `.pt` weights ### Motivation I quantized Mixtral 8x7b model using HQQ (which produces a `qmodel.pt` file). But I am...
# What does this PR do? Fixes #391 ## Before submitting - [ ] This PR fixes a typo or improves the docs (you can dismiss the other checks if...
### Feature request Have the (async) client automatically backoff sending requests when the deployment is overloaded. ### Motivation When the async client exceeds the deployment queue capacity / rate limits,...
### System Info Running image **ghcr.io/predibase/lorax:0.9.0** in kubernetes. ### Information - [X] Docker - [ ] The CLI directly ### Tasks - [X] An officially supported command - [ ]...
### System Info latest ### Information - [ ] Docker - [ ] The CLI directly ### Tasks - [ ] An officially supported command - [ ] My own...
### Feature request Upgrade to latest AWQ kernels for better performance and less memory consumption. https://github.com/casper-hansen/AutoAWQ/pull/365 ### Motivation It seems that latest AWQ and Kernels changed a lot, including many...
### System Info So I just noticed this very strange behaviour that has perhaps no severe implications but nevertheless is interesting to explore: I am hosting a Mixtral model on...
### Feature request We should be able to log arbitrary headers to STDOUT for debugging / etc. We should be setting the names of the headers in the lorax startup...