lorax icon indicating copy to clipboard operation
lorax copied to clipboard

Multi-LoRA inference server that scales to 1000s of fine-tuned LLMs

Results 185 lorax issues
Sort by recently updated
recently updated
newest added

### System Info lorax latest docker, 2 A100, unbuntu 22.04 ### Information - [X] Docker - [ ] The CLI directly ### Tasks - [ ] An officially supported command...

bug

### Feature request Need support for loading models that only contain `.pt` weights ### Motivation I quantized Mixtral 8x7b model using HQQ (which produces a `qmodel.pt` file). But I am...

bug

# What does this PR do? Fixes #391 ## Before submitting - [ ] This PR fixes a typo or improves the docs (you can dismiss the other checks if...

### Feature request Have the (async) client automatically backoff sending requests when the deployment is overloaded. ### Motivation When the async client exceeds the deployment queue capacity / rate limits,...

enhancement

### System Info Running image **ghcr.io/predibase/lorax:0.9.0** in kubernetes. ### Information - [X] Docker - [ ] The CLI directly ### Tasks - [X] An officially supported command - [ ]...

documentation

### System Info latest ### Information - [ ] Docker - [ ] The CLI directly ### Tasks - [ ] An officially supported command - [ ] My own...

question

### Feature request Upgrade to latest AWQ kernels for better performance and less memory consumption. https://github.com/casper-hansen/AutoAWQ/pull/365 ### Motivation It seems that latest AWQ and Kernels changed a lot, including many...

enhancement

### System Info So I just noticed this very strange behaviour that has perhaps no severe implications but nevertheless is interesting to explore: I am hosting a Mixtral model on...

question

### Feature request We should be able to log arbitrary headers to STDOUT for debugging / etc. We should be setting the names of the headers in the lorax startup...

good first issue