axolotl Can't load BnB models

Please check that this issue hasn't been reported before.

[X] I searched previous Bug Reports didn't find any similar reports.

Expected Behavior

I want to load a BnB quantized model.

Current behaviour

It throws a ValueError.

Steps to reproduce

Launch the config yaml.

Config yaml

base_model: unsloth/tinyllama-bnb-4bit
model_type: LlamaForCausalLM
tokenizer_type: LlamaTokenizer
is_llama_derived_model: true

load_in_8bit: false
load_in_4bit: true
strict: false

datasets:
  - path: HuggingFaceH4/ultrachat_200k
    split: train_sft
    type: sharegpt
    conversation: chatml

dataset_prepared_path:
val_set_size: 0.05
output_dir: ./qlora-out

adapter: qlora
lora_model_dir:

sequence_len: 1096
sample_packing: true
pad_to_sequence_len: true

lora_r: 32
lora_alpha: 16
lora_dropout: 0.05
lora_target_modules:
lora_target_linear: true
lora_fan_in_fan_out:

wandb_project: axolotl
wandb_entity:
wandb_watch:
wandb_name: tinyllama
wandb_log_model:

gradient_accumulation_steps: 1
micro_batch_size: 1
max_steps: 20
optimizer: paged_adamw_32bit
lr_scheduler: cosine
learning_rate: 0.0002

train_on_inputs: false
group_by_length: false
bf16: false
fp16: true
tf32: false

gradient_checkpointing: true
early_stopping_patience:
resume_from_checkpoint:
local_rank:
logging_steps: 1
xformers_attention:
flash_attention: false

warmup_steps: 10
evals_per_epoch:
saves_per_epoch:
debug:
deepspeed:
weight_decay: 0.0
fsdp:
fsdp_config:
special_tokens:
  eos_token: "<|im_end|>"

tokens:
- "<|im_start|>"

Possible solution

Extend or remove the fixed check of gptq introduced here: https://github.com/OpenAccess-AI-Collective/axolotl/pull/913

Which Operating Systems are you using?

[X] Linux
[ ] macOS
[ ] Windows

Python Version

3.10

axolotl branch-commit

main

Acknowledgements

[X] My issue title is concise, descriptive, and in title casing.
[X] I have searched the existing issues to make sure this bug has not been reported yet.
[X] I am using the latest version of axolotl.
[X] I have provided enough information for the maintainers to reproduce and diagnose the issue.

Apr 10 '24 20:04 Blaizzy

More details regarding error please. Were you also the one who posted a bnb issue on discord?

Apr 12 '24 08:04 NanoCode012

Any new update regarding this error? I have a similar issue

Jun 21 '24 17:06 tsunayoshi21

@NanoCode012

Could you let me know what else are you looking for?

Jun 22 '24 08:06 Blaizzy

Could someone post logs of the issue? Is it due to the check of quant_config?

Jun 22 '24 11:06 NanoCode012

Ayt, got it!

I will post the logs later today

Jun 22 '24 12:06 Blaizzy

@NanoCode012 Yes, for me the error is that the check of quant_config always raises a error because the quant_method is not gptq, and if i set gptq:false in the yaml, it raises a error that says i can't load a quantized model without gptq.

So if my model is previously BnB quantized i have no clue of how i can finetune with axolotl

Jun 22 '24 23:06 tsunayoshi21

@Blaizzy what was your fix?

Jul 08 '24 13:07 FrederikHandberg

@Blaizzy what was your fix?

I used a full precision model and set load_in_4bit: to true

Example:

base_model: meta/llama-7b-hf
load_in_4bit: true

Whilst, I actually wanted to load a prequantized model.

base_model: meta/llama-7b-hf-4bit

Jul 08 '24 15:07 Blaizzy

Thanks

+1 id like to do the same (would be a nice addition)

Jul 08 '24 22:07 FrederikHandberg

Hey, sorry for taking so long to get back. To follow up on this. I re-used the author's config with some modification (remove wandb, change dataset loading, use bf16, save embed_tokens lm_head), and it runs for me.

base_model: unsloth/tinyllama-bnb-4bit
model_type: LlamaForCausalLM
tokenizer_type: LlamaTokenizer
is_llama_derived_model: true

load_in_8bit: false
load_in_4bit: true
strict: false

chat_template: chatml
datasets:
  - path: HuggingFaceH4/ultrachat_200k
    split: train_sft
    type: chat_template

    field_messages: messages
    message_field_role: role
    message_field_content: content


dataset_prepared_path:
val_set_size: 0.05
output_dir: ./qlora-out

adapter: qlora
lora_model_dir:

sequence_len: 1096
sample_packing: true
pad_to_sequence_len: true

lora_r: 32
lora_alpha: 16
lora_dropout: 0.05
lora_target_modules:
lora_target_linear: true
lora_fan_in_fan_out:
lora_modules_to_save: ['embed_tokens', 'lm_head']

# wandb_project: axolotl
# wandb_entity:
# wandb_watch:
# wandb_name: tinyllama
# wandb_log_model:

gradient_accumulation_steps: 1
micro_batch_size: 1
max_steps: 20
optimizer: paged_adamw_32bit
lr_scheduler: cosine
learning_rate: 0.0002

train_on_inputs: false
group_by_length: false
bf16: true
fp16: true
tf32: false

gradient_checkpointing: true
early_stopping_patience:
resume_from_checkpoint:
local_rank:
logging_steps: 1
xformers_attention:
flash_attention: false

warmup_steps: 10
evals_per_epoch:
saves_per_epoch:
debug:
deepspeed:
weight_decay: 0.0
fsdp:
fsdp_config:
special_tokens:
  eos_token: "<|im_end|>"

tokens:
- "<|im_start|>"

Would anyone of you be able to re-test and confirm it now works?

Oct 31 '24 08:10 NanoCode012

axolotl axolotl copied to clipboard

Can't load BnB models

Please check that this issue hasn't been reported before.

Expected Behavior

Current behaviour

Steps to reproduce

Config yaml

Possible solution

Which Operating Systems are you using?

Python Version

axolotl branch-commit

Acknowledgements

axolotl
axolotl copied to clipboard