NeMo
NeMo copied to clipboard
Add support for LoRA on vLLM
What does this PR do ?
Adds support for using LoRA adapters on checkpoints exported to vLLM.
Collection: NLP
Changelog
- Moved the LoRA conversion logic from the
convert_nemo_to_canonical.pyscript to a reusable module - Implemented on-load conversion of Nemo format LoRA checkpoints into HF format for vLLM
- Added support for enabling LoRAs on vLLM with automatic max rank detection
- Fixed the logger initialization in the vLLM deployment script
Usage
python deploy_vllm_triton.py -nc /path/to/checkpoint.nemo -lc /path/to/lora.nemo -tmn TEST ...
python query.py -mn TEST -p "Prompt text" -lt 0
PR Type:
- [X] New Feature
This PR is stale because it has been open for 14 days with no activity. Remove stale label or comment or update or this will be closed in 7 days.