sglang icon indicating copy to clipboard operation
sglang copied to clipboard

[Fix] Fix bugs and refactor codes in lora for better scalability.

Open aoshen524 opened this issue 11 months ago • 1 comments

Motivation

https://github.com/sgl-project/sglang/issues/3414 reports issues regarding limited model support compared to test_generation_models.py. This PR refines the LoRA dataclasses and tests several trending LoRA models on Hugging Face, uncovering some bugs and providing warnings for them to be addressed later.

Modifications

  • [x] Added an independent folder for lora-related tests.
  • [x] Introduced LoRAAdaptor and LoRAModelCase dataclasses in utils.py for improved test case management in test_lora_backend.py.
  • [x] Implemented dynamic tolerance settings based on the model and adaptor in tests in test_lora_backend.py.
  • [x] Fixed an issue where some LoRA modules only had gate weights and no up weights by initializing the up weights to zero in lora.py.
  • [x] Added a temporary restriction for not supporting embedding and LM head in lora_config.py.
  • [x] Added skeleton for multi-LoRAbackend tests.
  • [x] Fixed the handling of empty responses and ensured special tokens are skipped in runner.py.
  • [x] Identified an accuracy problem when using flashinfer as the backend.

Checklist

  • [x] Add backend test support for single adaptor, single prompt inference.
  • [ ] Add backend test support for single adaptor, batch prompts serving.
  • [ ] Add backend test support for multi-adaptor, same rank.
  • [ ] Add backend test support for multi-adaptor, different rank.
  • [ ] Add backend test support for adaptor with Embedding and Lm_head layer weights.

aoshen524 avatar Feb 18 '25 03:02 aoshen524

LGTM

Fridge003 avatar Feb 19 '25 03:02 Fridge003