axolotl Axolotl install and train is broken.

Please check that this issue hasn't been reported before.

[X] I searched previous Bug Reports didn't find any similar reports.

Expected Behavior

Looks like something is broken in the latest axolotl. Earlier it was working fine with latest torch: git clone https://github.com/OpenAccess-AI-Collective/axolotl cd axolotl

pip install torch pip3 install packaging ninja wheel pip3 install -e '.[flash-attn,deepspeed]'

But now it gives error related to dependancies.

Below works : pip install torch=="2.1.2" pip install -e git+https://github.com/OpenAccess-AI-Collective/axolotl#egg=axolotl pip install flash-attn=="2.5.0" pip install deepspeed=="0.13.1"

But then is fails later during training. Even colab example notebook is failing.

Current behaviour

During package installation dependancy errors are seen.

Below error is seen during training with downgraded torch. [2024-04-26 17:29:07,623] [ERROR] [axolotl.load_model:673] [PID:7553] [RANK:0] Some modules are dispatched on the CPU or the disk. Make sure you have enough GPU RAM to fit the quantized model. If you want to dispatch the model on the CPU or the disk while keeping these modules in 32-bit, you need to set load_in_8bit_fp32_cpu_offload=True and pass a custom device_map to from_pretrained. Check https://huggingface.co/docs/transformers/main/en/main_classes/quantization#offload-between-cpu-and-gpu for more details.

Traceback (most recent call last): File "/content/src/axolotl/src/axolotl/utils/models.py", line 630, in load_model model = getattr(transformers, model_type).from_pretrained( File "/usr/local/lib/python3.10/dist-packages/transformers/models/auto/auto_factory.py", line 563, in from_pretrained return model_class.from_pretrained( File "/usr/local/lib/python3.10/dist-packages/transformers/modeling_utils.py", line 3050, in from_pretrained hf_quantizer.validate_environment( File "/usr/local/lib/python3.10/dist-packages/transformers/quantizers/quantizer_bnb_4bit.py", line 86, in validate_environment raise ValueError( ValueError: Some modules are dispatched on the CPU or the disk. Make sure you have enough GPU RAM to fit the quantized model. If you want to dispatch the model on the CPU or the disk while keeping these modules in 32-bit, you need to set load_in_8bit_fp32_cpu_offload=True and pass a custom device_map to from_pretrained. Check https://huggingface.co/docs/transformers/main/en/main_classes/quantization#offload-between-cpu-and-gpu for more details.

Traceback (most recent call last): File "/usr/lib/python3.10/runpy.py", line 196, in _run_module_as_main return _run_code(code, main_globals, None, File "/usr/lib/python3.10/runpy.py", line 86, in _run_code exec(code, run_globals) File "/content/src/axolotl/src/axolotl/cli/train.py", line 59, in fire.Fire(do_cli) File "/usr/local/lib/python3.10/dist-packages/fire/core.py", line 143, in Fire component_trace = _Fire(component, args, parsed_flag_args, context, name) File "/usr/local/lib/python3.10/dist-packages/fire/core.py", line 477, in _Fire component, remaining_args = _CallAndUpdateTrace( File "/usr/local/lib/python3.10/dist-packages/fire/core.py", line 693, in _CallAndUpdateTrace component = fn(*varargs, **kwargs) File "/content/src/axolotl/src/axolotl/cli/train.py", line 35, in do_cli return do_train(parsed_cfg, parsed_cli_args) File "/content/src/axolotl/src/axolotl/cli/train.py", line 55, in do_train return train(cfg=cfg, cli_args=cli_args, dataset_meta=dataset_meta) File "/content/src/axolotl/src/axolotl/train.py", line 87, in train model, peft_config = load_model(cfg, tokenizer, inference=cli_args.inference) File "/content/src/axolotl/src/axolotl/utils/models.py", line 674, in load_model raise err File "/content/src/axolotl/src/axolotl/utils/models.py", line 630, in load_model model = getattr(transformers, model_type).from_pretrained( File "/usr/local/lib/python3.10/dist-packages/transformers/models/auto/auto_factory.py", line 563, in from_pretrained return model_class.from_pretrained( File "/usr/local/lib/python3.10/dist-packages/transformers/modeling_utils.py", line 3050, in from_pretrained hf_quantizer.validate_environment( File "/usr/local/lib/python3.10/dist-packages/transformers/quantizers/quantizer_bnb_4bit.py", line 86, in validate_environment raise ValueError( ValueError: Some modules are dispatched on the CPU or the disk. Make sure you have enough GPU RAM to fit the quantized model. If you want to dispatch the model on the CPU or the disk while keeping these modules in 32-bit, you need to set load_in_8bit_fp32_cpu_offload=True and pass a custom device_map to from_pretrained. Check https://huggingface.co/docs/transformers/main/en/main_classes/quantization#offload-between-cpu-and-gpu for more details.

Traceback (most recent call last): File "/usr/local/bin/accelerate", line 8, in sys.exit(main()) File "/usr/local/lib/python3.10/dist-packages/accelerate/commands/accelerate_cli.py", line 46, in main args.func(args) File "/usr/local/lib/python3.10/dist-packages/accelerate/commands/launch.py", line 1057, in launch_command simple_launcher(args) File "/usr/local/lib/python3.10/dist-packages/accelerate/commands/launch.py", line 673, in simple_launcher raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd) subprocess.CalledProcessError: Command '['/usr/bin/python3', '-m', 'axolotl.cli.train', '/content/test_axolotl.yaml']' returned non-zero exit status 1.

Steps to reproduce

Try Axolotl example notebook. During training it fails with given error. It used to work last week perfectly. Tried Gemma 7b model too and it too is failing.

Config yaml

base_model: google/gemma-7b
#base_model: meta-llama/Meta-Llama-3-8B-Instruct
model_type: AutoModelForCausalLM  #For Gemma
#model_type: LlamaForCausalLM
tokenizer_type: AutoTokenizer

load_in_8bit: false
load_in_4bit: true
strict: false

datasets:
  - path: /content/test_txt_data-10exmpl.json
    type: completion
    field: text
#datasets:
#  - path: ./mar_alpaca_dataset.json
#    type: alpaca
#    ds_type: json
dataset_prepared_path: /content
dataset_processes: 10
val_set_size: 0
output_dir: ./qlora-out

adapter: qlora
lora_model_dir:

sequence_len: 700
sample_packing: true
pad_to_sequence_len: true

lora_r: 16
lora_alpha: 32
lora_dropout: 0.05
lora_target_modules:
  - q_proj
  - v_proj
  - k_proj
  - o_proj
  - gate_proj
  - down_proj
  - up_proj
#lora_modules_to_save:
  #- embed_tokens
  #- lm_head
lora_target_linear: true
lora_fan_in_fan_out:


gradient_accumulation_steps: 1
micro_batch_size: 1
num_epochs: 1
optimizer: paged_adamw_32bit
lr_scheduler: cosine
learning_rate: 0.0002

train_on_inputs: false
group_by_length: false
bf16: false
fp16: false
tf32: false

gradient_checkpointing: true
early_stopping_patience:
resume_from_checkpoint:
local_rank:
logging_steps: 1
xformers_attention:
flash_attention: False

warmup_ratio: 0.1
evals_per_epoch: 1
eval_table_size:
eval_max_new_tokens: 128
eval_sample_packing: False
saves_per_epoch: 1
debug:
deepspeed:
weight_decay: 0.0
fsdp:
fsdp_config:
special_tokens:

save_safetensors: True
gpu_memory_limit: 14

Possible solution

Fix the errors and dependancies.

Which Operating Systems are you using?

[X] Linux
[ ] macOS
[ ] Windows

Python Version

3.10

axolotl branch-commit

latest

Acknowledgements

[X] My issue title is concise, descriptive, and in title casing.
[X] I have searched the existing issues to make sure this bug has not been reported yet.
[X] I am using the latest version of axolotl.
[X] I have provided enough information for the maintainers to reproduce and diagnose the issue.

Apr 26 '24 17:04 amitagh

I was having trouble but

root@62d88bdd9d38:/workspace/axolotl# export BNB_CUDA_VERSION=
root@62d88bdd9d38:/workspace/axolotl# accelerate launch -m axolotl.cli.train examples/openllama-3b/lora.yml
The following values were not passed to `accelerate launch` and had defaults used instead:
	`--num_processes` was set to a value of `1`
	`--num_machines` was set to a value of `1`
	`--mixed_precision` was set to a value of `'no'`
	`--dynamo_backend` was set to a value of `'no'`
To avoid this warning pass in values for each of the problematic parameters or run `accelerate config`.
[2024-04-29 13:28:37,017] [INFO] [real_accelerator.py:191:get_accelerator] Setting ds_accelerator to cuda (auto detect)
[2024-04-29 13:28:37,881] [WARNING] [axolotl.utils.config.models.input.hint_sample_packing_padding:686] [PID:343] [RANK:0] `pad_to_sequence_len: true` is recommended when using sample_packing
[2024-04-29 13:28:38,092] [INFO] [axolotl.normalize_config:182] [PID:343] [RANK:0] GPU memory usage baseline: 0.000GB (+0.600GB misc)
                                 dP            dP   dP 
                                 88            88   88 
      .d8888b. dP.  .dP .d8888b. 88 .d8888b. d8888P 88 
      88'  `88  `8bd8'  88'  `88 88 88'  `88   88   88 
      88.  .88  .d88b.  88.  .88 88 88.  .88   88   88 
      `88888P8 dP'  `dP `88888P' dP `88888P'   dP   dP 
                                                       
                                                       

****************************************
**** Axolotl Dependency Versions *****
  accelerate: 0.28.0         
        peft: 0.10.0         
transformers: 4.40.0.dev0    
         trl: 0.8.5          
       torch: 2.1.2+cu118    
bitsandbytes: 0.43.0         
****************************************
[2024-04-29 13:28:38,126] [WARNING] [axolotl.scripts.check_user_token:464] [PID:343] [RANK:0] Error verifying HuggingFace token. Remember to log in using `huggingface-cli login` and get your access token from https://huggingface.co/settings/tokens if you want to use gated models or datasets.
You are using the default legacy behaviour of the <class 'transformers.models.llama.tokenization_llama.LlamaTokenizer'>. This is expected, and simply means that the `legacy` (previous) behavior will be used so nothing changes for you. If you want to use the new behaviour, set `legacy=False`. This should only be set if you understand what it means, and thoroughly read the reason why this was added as explained in https://github.com/huggingface/transformers/pull/24565
[2024-04-29 13:28:38,456] [DEBUG] [axolotl.load_tokenizer:279] [PID:343] [RANK:0] EOS: 2 / </s>
[2024-04-29 13:28:38,457] [DEBUG] [axolotl.load_tokenizer:280] [PID:343] [RANK:0] BOS: 1 / <s>
[2024-04-29 13:28:38,457] [DEBUG] [axolotl.load_tokenizer:281] [PID:343] [RANK:0] PAD: 2 / </s>
[2024-04-29 13:28:38,457] [DEBUG] [axolotl.load_tokenizer:282] [PID:343] [RANK:0] UNK: 0 / <unk>
[2024-04-29 13:28:38,457] [INFO] [axolotl.load_tokenizer:293] [PID:343] [RANK:0] No Chat template selected. Consider adding a chat template for easier inference.
[2024-04-29 13:28:38,457] [INFO] [axolotl.load_tokenized_prepared_datasets:183] [PID:343] [RANK:0] Unable to find prepared dataset in last_run_prepared/8cc35674c453a287d7de953d7084a596
[2024-04-29 13:28:38,457] [INFO] [axolotl.load_tokenized_prepared_datasets:184] [PID:343] [RANK:0] Loading raw datasets...
[2024-04-29 13:28:38,457] [WARNING] [axolotl.load_tokenized_prepared_datasets:186] [PID:343] [RANK:0] Processing datasets during training can lead to VRAM instability. Please pre-process your dataset.
[2024-04-29 13:28:38,457] [INFO] [axolotl.load_tokenized_prepared_datasets:193] [PID:343] [RANK:0] No seed provided, using default seed of 42
Repo card metadata block was not found. Setting CardData to empty.
[2024-04-29 13:28:39,088] [WARNING] [huggingface_hub.repocard.content:107] [PID:343] Repo card metadata block was not found. Setting CardData to empty.
Repo card metadata block was not found. Setting CardData to empty.
[2024-04-29 13:28:43,084] [WARNING] [huggingface_hub.repocard.content:107] [PID:343] Repo card metadata block was not found. Setting CardData to empty.
[2024-04-29 13:28:45,399] [INFO] [axolotl.load_tokenized_prepared_datasets:410] [PID:343] [RANK:0] merging datasets
[2024-04-29 13:28:45,438] [INFO] [axolotl.load_tokenized_prepared_datasets:423] [PID:343] [RANK:0] Saving merged prepared dataset to disk... last_run_prepared/8cc35674c453a287d7de953d7084a596
Saving the dataset (1/1 shards): 100%|████████████████████| 54568/54568 [00:00<00:00, 91839.43 examples/s]
[2024-04-29 13:28:46,062] [DEBUG] [axolotl.log:61] [PID:343] [RANK:0] total_num_tokens: 182_913
[2024-04-29 13:28:46,070] [DEBUG] [axolotl.log:61] [PID:343] [RANK:0] `total_supervised_tokens: 38_104`
[2024-04-29 13:28:50,077] [INFO] [axolotl.utils.samplers.multipack._len_est:184] [PID:343] [RANK:0] packing_efficiency_estimate: 1.0 total_num_tokens per device: 182913
[2024-04-29 13:28:50,077] [DEBUG] [axolotl.log:61] [PID:343] [RANK:0] data_loader_len: 43
[2024-04-29 13:28:50,077] [INFO] [axolotl.log:61] [PID:343] [RANK:0] sample_packing_eff_est across ranks: [0.9501381732047872]
[2024-04-29 13:28:50,077] [DEBUG] [axolotl.log:61] [PID:343] [RANK:0] sample_packing_eff_est: None
[2024-04-29 13:28:50,078] [DEBUG] [axolotl.log:61] [PID:343] [RANK:0] total_num_steps: 172
[2024-04-29 13:28:50,126] [DEBUG] [axolotl.log:61] [PID:343] [RANK:0] total_num_tokens: 10_466_111
[2024-04-29 13:28:50,510] [DEBUG] [axolotl.log:61] [PID:343] [RANK:0] `total_supervised_tokens: 6_735_490`
[2024-04-29 13:28:50,546] [INFO] [axolotl.utils.samplers.multipack._len_est:184] [PID:343] [RANK:0] packing_efficiency_estimate: 1.0 total_num_tokens per device: 10466111
[2024-04-29 13:28:50,547] [DEBUG] [axolotl.log:61] [PID:343] [RANK:0] data_loader_len: 2529
[2024-04-29 13:28:50,547] [INFO] [axolotl.log:61] [PID:343] [RANK:0] sample_packing_eff_est across ranks: [0.9323856525668217]
[2024-04-29 13:28:50,547] [DEBUG] [axolotl.log:61] [PID:343] [RANK:0] sample_packing_eff_est: 0.94
[2024-04-29 13:28:50,547] [DEBUG] [axolotl.log:61] [PID:343] [RANK:0] total_num_steps: 10116
[2024-04-29 13:28:50,554] [DEBUG] [axolotl.train.log:61] [PID:343] [RANK:0] loading tokenizer... openlm-research/open_llama_3b_v2
[2024-04-29 13:28:50,852] [DEBUG] [axolotl.load_tokenizer:279] [PID:343] [RANK:0] EOS: 2 / </s>
[2024-04-29 13:28:50,852] [DEBUG] [axolotl.load_tokenizer:280] [PID:343] [RANK:0] BOS: 1 / <s>
[2024-04-29 13:28:50,852] [DEBUG] [axolotl.load_tokenizer:281] [PID:343] [RANK:0] PAD: 2 / </s>
[2024-04-29 13:28:50,852] [DEBUG] [axolotl.load_tokenizer:282] [PID:343] [RANK:0] UNK: 0 / <unk>
[2024-04-29 13:28:50,852] [INFO] [axolotl.load_tokenizer:293] [PID:343] [RANK:0] No Chat template selected. Consider adding a chat template for easier inference.
[2024-04-29 13:28:50,852] [DEBUG] [axolotl.train.log:61] [PID:343] [RANK:0] loading model and peft_config...
[2024-04-29 13:28:51,000] [INFO] [axolotl.load_model:359] [PID:343] [RANK:0] patching with flash attention for sample packing
[2024-04-29 13:28:51,001] [INFO] [axolotl.load_model:408] [PID:343] [RANK:0] patching _expand_mask
`low_cpu_mem_usage` was None, now set to True since model is quantized.
pytorch_model.bin: 100%|█████████████████████████████████████████████| 6.85G/6.85G [01:37<00:00, 70.1MB/s]
/root/miniconda3/envs/py3.10/lib/python3.10/site-packages/torch/_utils.py:831: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
  return self.fget.__get__(instance, owner)()
generation_config.json: 100%|█████████████████████████████████████████████| 137/137 [00:00<00:00, 370kB/s]
[2024-04-29 13:30:31,716] [INFO] [axolotl.load_model:720] [PID:343] [RANK:0] GPU memory usage after model load: 3.430GB (+0.146GB cache, +0.569GB misc)
[2024-04-29 13:30:31,728] [INFO] [axolotl.load_model:771] [PID:343] [RANK:0] converting PEFT model w/ prepare_model_for_kbit_training
[2024-04-29 13:30:31,730] [INFO] [axolotl.load_model:780] [PID:343] [RANK:0] converting modules to torch.float16 for flash attention
trainable params: 12,712,960 || all params: 3,439,186,560 || trainable%: 0.36965020007521776
[2024-04-29 13:30:31,954] [INFO] [axolotl.load_model:825] [PID:343] [RANK:0] GPU memory usage after adapters: 3.478GB (+0.911GB cache, +0.569GB misc)
Using the `WANDB_DISABLED` environment variable is deprecated and will be removed in v5. Use the --report_to flag to control the integrations used for logging result (for instance --report_to none).
[2024-04-29 13:30:32,017] [INFO] [axolotl.train.log:61] [PID:343] [RANK:0] Pre-saving adapter config to ./lora-out
[2024-04-29 13:30:32,020] [INFO] [axolotl.train.log:61] [PID:343] [RANK:0] Starting trainer...
model.safetensors:   0%|                                                      | 0.00/6.85G [00:00<?, ?B/s][2024-04-29 13:30:32,189] [INFO] [axolotl.utils.samplers.multipack._len_est:184] [PID:343] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 10466111
[2024-04-29 13:30:32,226] [INFO] [axolotl.utils.samplers.multipack._len_est:184] [PID:343] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 10466111
                                                                                                         [2024-04-29 13:30:32,318] [INFO] [axolotl.utils.samplers.multipack._len_est:184] [PID:343] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 10466111
model.safetensors:   0%|                                             | 10.5M/6.85G [00:00<03:41, 30.8MB/s]/root/miniconda3/envs/py3.10/lib/python3.10/site-packages/torch/utils/checkpoint.py:429: UserWarning: torch.utils.checkpoint: please pass in use_reentrant=True or use_reentrant=False explicitly. The default value of use_reentrant will be updated to be False in the future. To maintain current behavior, pass use_reentrant=True. It is recommended that you use use_reentrant=False. Refer to docs for more details on the differences between the two variants.
  warnings.warn(
model.safetensors:   0%|▏                                            | 21.0M/6.85G [00:00<02:27, 46.3MB/s]Traceback (most recent call last):
  File "/root/miniconda3/envs/py3.10/lib/python3.10/runpy.py", line 196, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/root/miniconda3/envs/py3.10/lib/python3.10/runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "/workspace/axolotl/src/axolotl/cli/train.py", line 59, in <module>
    fire.Fire(do_cli)
  File "/root/miniconda3/envs/py3.10/lib/python3.10/site-packages/fire/core.py", line 143, in Fire
    component_trace = _Fire(component, args, parsed_flag_args, context, name)
  File "/root/miniconda3/envs/py3.10/lib/python3.10/site-packages/fire/core.py", line 477, in _Fire
    component, remaining_args = _CallAndUpdateTrace(
  File "/root/miniconda3/envs/py3.10/lib/python3.10/site-packages/fire/core.py", line 693, in _CallAndUpdateTrace
    component = fn(*varargs, **kwargs)
  File "/workspace/axolotl/src/axolotl/cli/train.py", line 35, in do_cli
    return do_train(parsed_cfg, parsed_cli_args)
  File "/workspace/axolotl/src/axolotl/cli/train.py", line 55, in do_train
    return train(cfg=cfg, cli_args=cli_args, dataset_meta=dataset_meta)
  File "/workspace/axolotl/src/axolotl/train.py", line 163, in train
    trainer.train(resume_from_checkpoint=resume_from_checkpoint)
  File "/root/miniconda3/envs/py3.10/lib/python3.10/site-packages/transformers/trainer.py", line 1837, in train
    return inner_training_loop(
  File "/root/miniconda3/envs/py3.10/lib/python3.10/site-packages/transformers/trainer.py", line 2181, in _inner_training_loop
    tr_loss_step = self.training_step(model, inputs)
  File "/root/miniconda3/envs/py3.10/lib/python3.10/site-packages/transformers/trainer.py", line 3116, in training_step
    loss = self.compute_loss(model, inputs)
  File "/workspace/axolotl/src/axolotl/core/trainer_builder.py", line 492, in compute_loss
    return super().compute_loss(model, inputs, return_outputs=return_outputs)
  File "/root/miniconda3/envs/py3.10/lib/python3.10/site-packages/transformers/trainer.py", line 3139, in compute_loss
    outputs = model(**inputs)
  File "/root/miniconda3/envs/py3.10/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/root/miniconda3/envs/py3.10/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
    return forward_call(*args, **kwargs)
  File "/root/miniconda3/envs/py3.10/lib/python3.10/site-packages/accelerate/utils/operations.py", line 822, in forward
    return model_forward(*args, **kwargs)
  File "/root/miniconda3/envs/py3.10/lib/python3.10/site-packages/accelerate/utils/operations.py", line 810, in __call__
    return convert_to_fp32(self.model_forward(*args, **kwargs))
  File "/root/miniconda3/envs/py3.10/lib/python3.10/site-packages/torch/amp/autocast_mode.py", line 16, in decorate_autocast
    return func(*args, **kwargs)
  File "/root/miniconda3/envs/py3.10/lib/python3.10/site-packages/peft/peft_model.py", line 1129, in forward
    return self.base_model(
  File "/root/miniconda3/envs/py3.10/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/root/miniconda3/envs/py3.10/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
    return forward_call(*args, **kwargs)
  File "/root/miniconda3/envs/py3.10/lib/python3.10/site-packages/peft/tuners/tuners_utils.py", line 161, in forward
    return self.model.forward(*args, **kwargs)
  File "/root/miniconda3/envs/py3.10/lib/python3.10/site-packages/accelerate/hooks.py", line 166, in new_forward
    output = module._old_forward(*args, **kwargs)
  File "/root/miniconda3/envs/py3.10/lib/python3.10/site-packages/transformers/models/llama/modeling_llama.py", line 1189, in forward
    outputs = self.model(
  File "/root/miniconda3/envs/py3.10/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/root/miniconda3/envs/py3.10/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
    return forward_call(*args, **kwargs)
  File "/root/miniconda3/envs/py3.10/lib/python3.10/site-packages/accelerate/hooks.py", line 166, in new_forward
    output = module._old_forward(*args, **kwargs)
  File "/workspace/axolotl/src/axolotl/monkeypatch/llama_attn_hijack_flash.py", line 809, in llama_model_forward
    layer_outputs = torch.utils.checkpoint.checkpoint(
  File "/root/miniconda3/envs/py3.10/lib/python3.10/site-packages/torch/_compile.py", line 24, in inner
    return torch._dynamo.disable(fn, recursive)(*args, **kwargs)
  File "/root/miniconda3/envs/py3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 328, in _fn
    return fn(*args, **kwargs)
  File "/root/miniconda3/envs/py3.10/lib/python3.10/site-packages/torch/_dynamo/external_utils.py", line 17, in inner
    return fn(*args, **kwargs)
  File "/root/miniconda3/envs/py3.10/lib/python3.10/site-packages/torch/utils/checkpoint.py", line 451, in checkpoint
    return CheckpointFunction.apply(function, preserve, *args)
  File "/root/miniconda3/envs/py3.10/lib/python3.10/site-packages/torch/autograd/function.py", line 539, in apply
    return super().apply(*args, **kwargs)  # type: ignore[misc]
  File "/root/miniconda3/envs/py3.10/lib/python3.10/site-packages/torch/utils/checkpoint.py", line 230, in forward
    outputs = run_function(*args)
  File "/workspace/axolotl/src/axolotl/monkeypatch/llama_attn_hijack_flash.py", line 803, in custom_forward
    return module(
  File "/root/miniconda3/envs/py3.10/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/root/miniconda3/envs/py3.10/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
    return forward_call(*args, **kwargs)
  File "/root/miniconda3/envs/py3.10/lib/python3.10/site-packages/accelerate/hooks.py", line 166, in new_forward
    output = module._old_forward(*args, **kwargs)
  File "/workspace/axolotl/src/axolotl/monkeypatch/llama_attn_hijack_flash.py", line 902, in forward
    hidden_states, self_attn_weights, present_key_value = self.self_attn(
  File "/root/miniconda3/envs/py3.10/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/root/miniconda3/envs/py3.10/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
    return forward_call(*args, **kwargs)
  File "/root/miniconda3/envs/py3.10/lib/python3.10/site-packages/accelerate/hooks.py", line 166, in new_forward
    output = module._old_forward(*args, **kwargs)
  File "/workspace/axolotl/src/axolotl/monkeypatch/llama_attn_hijack_flash.py", line 478, in flashattn_forward
    output = flash_attn_varlen_qkvpacked_func(
  File "/root/miniconda3/envs/py3.10/lib/python3.10/site-packages/flash_attn/flash_attn_interface.py", line 887, in flash_attn_varlen_qkvpacked_func
    return FlashAttnVarlenQKVPackedFunc.apply(
  File "/root/miniconda3/envs/py3.10/lib/python3.10/site-packages/torch/autograd/function.py", line 539, in apply
    return super().apply(*args, **kwargs)  # type: ignore[misc]
  File "/root/miniconda3/envs/py3.10/lib/python3.10/site-packages/flash_attn/flash_attn_interface.py", line 288, in forward
    out, q, k, v, out_padded, softmax_lse, S_dmask, rng_state = _flash_attn_varlen_forward(
  File "/root/miniconda3/envs/py3.10/lib/python3.10/site-packages/flash_attn/flash_attn_interface.py", line 85, in _flash_attn_varlen_forward
    out, q, k, v, out_padded, softmax_lse, S_dmask, rng_state = flash_attn_cuda.varlen_fwd(
RuntimeError: FlashAttention only supports Ampere GPUs or newer.
model.safetensors: 100%|█████████████████████████████████████████████| 6.85G/6.85G [01:52<00:00, 61.0MB/s]
  0%|                                                                           | 0/21524 [01:52<?, ?it/s]
Traceback (most recent call last):
  File "/root/miniconda3/envs/py3.10/bin/accelerate", line 8, in <module>
    sys.exit(main())
  File "/root/miniconda3/envs/py3.10/lib/python3.10/site-packages/accelerate/commands/accelerate_cli.py", line 46, in main
    args.func(args)
  File "/root/miniconda3/envs/py3.10/lib/python3.10/site-packages/accelerate/commands/launch.py", line 1057, in launch_command
    simple_launcher(args)
  File "/root/miniconda3/envs/py3.10/lib/python3.10/site-packages/accelerate/commands/launch.py", line 673, in simple_launcher
    raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd)
subprocess.CalledProcessError: Command '['/root/miniconda3/envs/py3.10/bin/python3', '-m', 'axolotl.cli.train', 'examples/openllama-3b/lora.yml']' returned non-zero exit status 1.

helped get me to the next problem...

Also apt-get install python-is-python3 helped fix another problem I was having

Apr 29 '24 13:04 tinycrops

@MeDott29 , you may be having FA issues RuntimeError: FlashAttention only supports Ampere GPUs or newer.

@amitagh , did you try other example configs?

Apr 30 '24 14:04 NanoCode012

Running colab (T4 free version) example produces the following error:

File "/content/src/axolotl/src/axolotl/train.py", line 170, in train trainer.train(resume_from_checkpoint=resume_from_checkpoint) File "/usr/local/lib/python3.10/dist-packages/transformers/trainer.py", line 1859, in train return inner_training_loop( File "/usr/local/lib/python3.10/dist-packages/transformers/trainer.py", line 2249, in inner_training_loop grad_norm = self.accelerator.clip_grad_norm( File "/usr/local/lib/python3.10/dist-packages/accelerate/accelerator.py", line 2269, in clip_grad_norm self.unscale_gradients() File "/usr/local/lib/python3.10/dist-packages/accelerate/accelerator.py", line 2219, in unscale_gradients self.scaler.unscale_(opt) File "/usr/local/lib/python3.10/dist-packages/torch/cuda/amp/grad_scaler.py", line 307, in unscale_ optimizer_state["found_inf_per_device"] = self.unscale_grads( File "/usr/local/lib/python3.10/dist-packages/torch/cuda/amp/grad_scaler.py", line 248, in unscale_grads torch.amp_foreach_non_finite_check_and_unscale( RuntimeError: "_amp_foreach_non_finite_check_and_unscale_cuda" not implemented for 'BFloat16'

Speculation:

This might be due to T4 not supporting bfloat16, although disabling it produces the same error

May 18 '24 22:05 jaydeepthik

I have met the exactly same error as jaydeepthik's while I was running Axolotl training on Colab notebook with L4 GPU.

File "/usr/local/lib/python3.10/dist-packages/torch/amp/grad_scaler.py", line 278, in _unscale_grads_
    torch._amp_foreach_non_finite_check_and_unscale_(
RuntimeError: "_amp_foreach_non_finite_check_and_unscale_cuda" not implemented for 'BFloat16'

The error occurred in the following code cell. !accelerate launch -m axolotl.cli.train /content/llama-3-8b-Instruct-bnb-4bit-qlora.yaml

Would you take a look?

Jul 18 '24 13:07 chdaesung

@chdaesung @jaydeepthik , sorry, I missed the notifications. Do you have a sample Colab notebook? I was able to run successfully on Colab these past weeks.

Oct 16 '24 08:10 NanoCode012

Hi, we've had a major overhaul since then and Axolotl installation is now much easier. Here is an updated colab notebook: https://github.com/axolotl-ai-cloud/axolotl/blob/main/examples/colab-notebooks/colab-axolotl-example.ipynb

which details installation.

Dec 17 '24 07:12 bursteratom