NeMo icon indicating copy to clipboard operation
NeMo copied to clipboard

Add support for LoRA on vLLM

Open apanteleev opened this issue 1 year ago • 1 comments

What does this PR do ?

Adds support for using LoRA adapters on checkpoints exported to vLLM.

Collection: NLP

Changelog

  • Moved the LoRA conversion logic from the convert_nemo_to_canonical.py script to a reusable module
  • Implemented on-load conversion of Nemo format LoRA checkpoints into HF format for vLLM
  • Added support for enabling LoRAs on vLLM with automatic max rank detection
  • Fixed the logger initialization in the vLLM deployment script

Usage

python deploy_vllm_triton.py -nc /path/to/checkpoint.nemo -lc /path/to/lora.nemo -tmn TEST ...
python query.py -mn TEST -p "Prompt text" -lt 0

PR Type:

  • [X] New Feature

apanteleev avatar Aug 01 '24 18:08 apanteleev

This PR is stale because it has been open for 14 days with no activity. Remove stale label or comment or update or this will be closed in 7 days.

github-actions[bot] avatar Aug 20 '24 01:08 github-actions[bot]