sglang
sglang copied to clipboard
[Fix] Fix bugs and refactor codes in lora for better scalability.
Motivation
https://github.com/sgl-project/sglang/issues/3414 reports issues regarding limited model support compared to test_generation_models.py. This PR refines the LoRA dataclasses and tests several trending LoRA models on Hugging Face, uncovering some bugs and providing warnings for them to be addressed later.
Modifications
- [x] Added an independent folder for lora-related tests.
- [x] Introduced LoRAAdaptor and LoRAModelCase dataclasses in utils.py for improved test case management in test_lora_backend.py.
- [x] Implemented dynamic tolerance settings based on the model and adaptor in tests in test_lora_backend.py.
- [x] Fixed an issue where some LoRA modules only had gate weights and no up weights by initializing the up weights to zero in lora.py.
- [x] Added a temporary restriction for not supporting embedding and LM head in lora_config.py.
- [x] Added skeleton for multi-LoRAbackend tests.
- [x] Fixed the handling of empty responses and ensured special tokens are skipped in runner.py.
- [x] Identified an accuracy problem when using flashinfer as the backend.
Checklist
- [x] Add backend test support for single adaptor, single prompt inference.
- [ ] Add backend test support for single adaptor, batch prompts serving.
- [ ] Add backend test support for multi-adaptor, same rank.
- [ ] Add backend test support for multi-adaptor, different rank.
- [ ] Add backend test support for adaptor with Embedding and Lm_head layer weights.
LGTM