[Fix] Fix bugs and refactor codes in lora for better scalability.

Open aoshen524 opened this issue 11 months ago • 1 comments

Motivation

https://github.com/sgl-project/sglang/issues/3414 reports issues regarding limited model support compared to test_generation_models.py. This PR refines the LoRA dataclasses and tests several trending LoRA models on Hugging Face, uncovering some bugs and providing warnings for them to be addressed later.

Modifications

[x] Added an independent folder for lora-related tests.
[x] Introduced LoRAAdaptor and LoRAModelCase dataclasses in utils.py for improved test case management in test_lora_backend.py.
[x] Implemented dynamic tolerance settings based on the model and adaptor in tests in test_lora_backend.py.
[x] Fixed an issue where some LoRA modules only had gate weights and no up weights by initializing the up weights to zero in lora.py.
[x] Added a temporary restriction for not supporting embedding and LM head in lora_config.py.
[x] Added skeleton for multi-LoRAbackend tests.
[x] Fixed the handling of empty responses and ensured special tokens are skipped in runner.py.
[x] Identified an accuracy problem when using flashinfer as the backend.

Checklist

[x] Add backend test support for single adaptor, single prompt inference.
[ ] Add backend test support for single adaptor, batch prompts serving.
[ ] Add backend test support for multi-adaptor, same rank.
[ ] Add backend test support for multi-adaptor, different rank.
[ ] Add backend test support for adaptor with Embedding and Lm_head layer weights.

Feb 18 '25 03:02 aoshen524

LGTM

Feb 19 '25 03:02 Fridge003