lorax icon indicating copy to clipboard operation
lorax copied to clipboard

Multi-LoRA inference server that scales to 1000s of fine-tuned LLMs

Results 185 lorax issues
Sort by recently updated
recently updated
newest added

enhancement
good first issue

### System Info lorax main ### Information - [X] Docker - [ ] The CLI directly ### Tasks - [ ] An officially supported command - [ ] My own...

### System Info lorax: v0.9.0 awq: main branch transformers: v4.39.3 ### Information - [ ] Docker - [ ] The CLI directly ### Tasks - [ ] An officially supported...

bug

Currently we only support a subset of LoRAX launcher args. We should support all of them as optional overrides: https://github.com/predibase/lorax/blob/main/charts/lorax/templates/deployment.yaml#L35

enhancement

### Feature request Retrieve all lora models from Huggingface hub by base model setting. such as collect all lora based on meta-llama/Meta-Llama-3-8B ### Motivation If I want to take a...

enhancement
good first issue

### System Info Doesn't work if you make changes to the vocab ### Information - [ ] Docker - [ ] The CLI directly ### Tasks - [ ] An...

enhancement

### System Info lorax version: `4c39e8a` ### Information - When prompting Mixtral with adapter got the following error: `Request failed during generation: Server error: output with shape [1, 32000] doesn't...

bug

We can add back the FA1 implementation from https://github.com/huggingface/text-generation-inference/pull/624 when compute capability of Volta or Turing is detected. This may bloat the Docker somewhat to support both, but it seems...

enhancement
good first issue

The concurrency currently assumes that host execution time that holds the GIL is minimal, but particularly for loading adapters from disk to host memory, we see that large adapters can...

enhancement

### Feature request I only see source=local available for the adapters, is this the case? Even with the models cached/pointing to it locally, there is still a callout to HF...

enhancement