lorax issues

feat: support lazy loading the lora module for reducing the loading p…

3

# What does this PR do? Fixes #433 ## Before submitting - [ ] This PR fixes a typo or improves the docs (you can dismiss the other checks if...

thincal

fix: load tokenizer/config with trust_remote_code

# What does this PR do? loading the tokenizer with remote code and unregistered class, will require user interactive input for confirming yes/no. That will break the normal processing. Fixes...

thincal

Fix for the LM_HEAD issue

# (WIP) Fix for the LM_HEAD issue **Root Cause**. The error is caused by incorrect segments passed to the `lora_b_sgmv` kernel during the prefill stage. This happens because we do...

ajtejankar

Improve warmup checking for max new tokens when using speculative decoding

1

If speculative decoding is in use and the user wants to generate up to the max positional embeddings of the model, errors can arise at runtime causing a CUDA device-side...

tgaddair

bug

good first issue

Ensure api_token is not included in the response on error

3

tgaddair

bug

Bug Report: lorax-launcher failed with --source "s3" for model_id "mistralai/Mistral-7B-Instruct-v0.2"

2

### System Info lorax_version: "a7e8175" Python 3.10.8 Platform: ml.g5.16xlarge (AWS) When deploy the docker container with the source from "s3" and model_id "mistralai/Mistral-7B-Instruct-v0.2" (`lorax-launcher --port 8080 --source "s3"`), it failed...

donjing

bug

Improve OpenAI API compatibility

13

### Feature request Implement `v1/models` like OpenAI API to list available local **loras**. This is dependent on #199 There is also a hurdle to this: A user may have multiple...

rbollampally

Use special tokens specific to the fine-tuned adapter during decoding

1

During fine-tuning, it's possible that special tokens are added that are specific to the adapter. During decoding, we should be using the special tokens, and ensure the correct stop tokens,...

tgaddair

enhancement

Project Roadmap

36

WIP project roadmap for LoRAX. We'll continue to update this over time. # v0.10 - [ ] Speculative decoding adapters - [ ] AQLM # v0.11 - [ ] Prefix...

tgaddair

enhancement

CUDA error when for sgmv_lora_b for LM_head with many concurrent requests

7

It seems that if the server is flooded with requests for a new adapter it needs to download, a race condition can arise leading to CUDA errors. Needs more investigation.

tgaddair

bug

lorax
lorax copied to clipboard

Metadata

feat: support lazy loading the lora module for reducing the loading p…

fix: load tokenizer/config with trust_remote_code

Fix for the LM_HEAD issue

Improve warmup checking for max new tokens when using speculative decoding

Ensure api_token is not included in the response on error

Bug Report: lorax-launcher failed with --source "s3" for model_id "mistralai/Mistral-7B-Instruct-v0.2"

Improve OpenAI API compatibility

Use special tokens specific to the fine-tuned adapter during decoding

Project Roadmap

CUDA error when for sgmv_lora_b for LM_head with many concurrent requests

← Metadata

Owner

Metadata

lorax lorax copied to clipboard

Metadata

← Metadata

Owner

Metadata

lorax
lorax copied to clipboard