basaran icon indicating copy to clipboard operation
basaran copied to clipboard

Allow loading model with BitsAndBytes 4bit quantization, PEFT LoRA adapters.

Open idoru opened this issue 2 years ago • 5 comments

Also supports loading PEFT LoRA adapters with MODEL_PEFT=true. For detail on 4bit quantization options, see: https://huggingface.co/blog/4bit-transformers-bitsandbytes

Implements #202

idoru avatar Jun 04 '23 23:06 idoru

It might be better to split the QLora stuff from the Peft Lora adapter support.

Qlora/4bit requires latest/git-master version of transformers, accelerate, and such (and I don't see that listed in the requirements.txt on this PR).

Lora-adapter support should be possible without bleeding edge versions of transformers though so that'd be great to get merged in first.

LoopControl avatar Jun 05 '23 00:06 LoopControl

Thanks for the review! I'm very new to working on Python codebases, so haven't fully got the hang of the dependency management workflows and gotchas. I'll split them as you suggested, and fix the requirements.

idoru avatar Jun 05 '23 02:06 idoru

Huggingface finally released QLoRa-supported versions of transformers and accelerate, which allows us to add basic 4-bit quantization support in https://github.com/hyperonym/basaran/pull/209.

Maybe you can simplify this PR to include only PEFT stuffs? Of course it would also be easier if you want to add more detailed options for 4-bit quantization, as dependencies are no longer an issue.

peakji avatar Jun 09 '23 06:06 peakji

Hi, thanks for the feedback. I've updated the PR now. Tested with my very amateur QLoRA model with the following:

MODEL_TRUST_REMOTE_CODE=true \
MODEL_LOAD_IN_4BIT=true \
MODEL_4BIT_QUANT_TYPE=nf4 \
MODEL_4BIT_DOUBLE_QUANT=true \
MODEL_PEFT=true \
MODEL=idoru/falcon-40b-nf4dq-chat-oasst1-2epoch-v2 \
PORT=8080 \
python -m basaran

idoru avatar Jun 15 '23 00:06 idoru

Codecov Report

Patch coverage: 36.84% and project coverage change: -2.61 :warning:

Comparison is base (1677491) 94.29% compared to head (33a37c1) 91.69%.

:exclamation: Your organization is not using the GitHub App Integration. As a result you may experience degraded service beginning May 15th. Please install the Github App Integration for your organization. Read more.

Additional details and impacted files
@@            Coverage Diff             @@
##           master     #203      +/-   ##
==========================================
- Coverage   94.29%   91.69%   -2.61%     
==========================================
  Files           7        7              
  Lines         333      349      +16     
==========================================
+ Hits          314      320       +6     
- Misses         19       29      +10     
Impacted Files Coverage Δ
basaran/model.py 83.52% <25.00%> (-5.01%) :arrow_down:
basaran/__init__.py 96.87% <100.00%> (+0.32%) :arrow_up:

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Do you have feedback about the report comment? Let us know in this issue.

codecov-commenter avatar Jun 15 '23 04:06 codecov-commenter