qlora
qlora copied to clipboard
The VRAM usage is more than 48GB.
In the paper, it was mentioned that 48G of graphics memory can train 65B of LLaMA
We present QLORA, an efficient finetuning approach that reduces memory usage enough to finetune a 65B parameter model on a single 48GB GPU while
preserving full 16-bit finetuning task performance.
While using the following code to train a LLaMA 65B model, it actually comsumed about 60G VRAM
python qlora.py