lorax
lorax copied to clipboard
Multi-LoRA inference server that scales to 1000s of fine-tuned LLMs
### System Info lorax main ### Information - [X] Docker - [ ] The CLI directly ### Tasks - [ ] An officially supported command - [ ] My own...
### System Info lorax: v0.9.0 awq: main branch transformers: v4.39.3 ### Information - [ ] Docker - [ ] The CLI directly ### Tasks - [ ] An officially supported...
Currently we only support a subset of LoRAX launcher args. We should support all of them as optional overrides: https://github.com/predibase/lorax/blob/main/charts/lorax/templates/deployment.yaml#L35
### Feature request Retrieve all lora models from Huggingface hub by base model setting. such as collect all lora based on meta-llama/Meta-Llama-3-8B ### Motivation If I want to take a...
### System Info Doesn't work if you make changes to the vocab ### Information - [ ] Docker - [ ] The CLI directly ### Tasks - [ ] An...
### System Info lorax version: `4c39e8a` ### Information - When prompting Mixtral with adapter got the following error: `Request failed during generation: Server error: output with shape [1, 32000] doesn't...
We can add back the FA1 implementation from https://github.com/huggingface/text-generation-inference/pull/624 when compute capability of Volta or Turing is detected. This may bloat the Docker somewhat to support both, but it seems...
The concurrency currently assumes that host execution time that holds the GIL is minimal, but particularly for loading adapters from disk to host memory, we see that large adapters can...
### Feature request I only see source=local available for the adapters, is this the case? Even with the models cached/pointing to it locally, there is still a callout to HF...