axolotl
axolotl copied to clipboard
Axolotl install and train is broken.
Please check that this issue hasn't been reported before.
- [X] I searched previous Bug Reports didn't find any similar reports.
Expected Behavior
Looks like something is broken in the latest axolotl. Earlier it was working fine with latest torch: git clone https://github.com/OpenAccess-AI-Collective/axolotl cd axolotl
pip install torch pip3 install packaging ninja wheel pip3 install -e '.[flash-attn,deepspeed]'
But now it gives error related to dependancies.
Below works : pip install torch=="2.1.2" pip install -e git+https://github.com/OpenAccess-AI-Collective/axolotl#egg=axolotl pip install flash-attn=="2.5.0" pip install deepspeed=="0.13.1"
But then is fails later during training. Even colab example notebook is failing.
Current behaviour
During package installation dependancy errors are seen.
Below error is seen during training with downgraded torch.
[2024-04-26 17:29:07,623] [ERROR] [axolotl.load_model:673] [PID:7553] [RANK:0]
Some modules are dispatched on the CPU or the disk. Make sure you have enough GPU RAM to fit the
quantized model. If you want to dispatch the model on the CPU or the disk while keeping these modules
in 32-bit, you need to set load_in_8bit_fp32_cpu_offload=True and pass a custom device_map to
from_pretrained. Check
https://huggingface.co/docs/transformers/main/en/main_classes/quantization#offload-between-cpu-and-gpu
for more details.
Traceback (most recent call last):
File "/content/src/axolotl/src/axolotl/utils/models.py", line 630, in load_model
model = getattr(transformers, model_type).from_pretrained(
File "/usr/local/lib/python3.10/dist-packages/transformers/models/auto/auto_factory.py", line 563, in from_pretrained
return model_class.from_pretrained(
File "/usr/local/lib/python3.10/dist-packages/transformers/modeling_utils.py", line 3050, in from_pretrained
hf_quantizer.validate_environment(
File "/usr/local/lib/python3.10/dist-packages/transformers/quantizers/quantizer_bnb_4bit.py", line 86, in validate_environment
raise ValueError(
ValueError:
Some modules are dispatched on the CPU or the disk. Make sure you have enough GPU RAM to fit the
quantized model. If you want to dispatch the model on the CPU or the disk while keeping these modules
in 32-bit, you need to set load_in_8bit_fp32_cpu_offload=True and pass a custom device_map to
from_pretrained. Check
https://huggingface.co/docs/transformers/main/en/main_classes/quantization#offload-between-cpu-and-gpu
for more details.
Traceback (most recent call last):
File "/usr/lib/python3.10/runpy.py", line 196, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/usr/lib/python3.10/runpy.py", line 86, in _run_code
exec(code, run_globals)
File "/content/src/axolotl/src/axolotl/cli/train.py", line 59, in load_in_8bit_fp32_cpu_offload=True and pass a custom device_map to
from_pretrained. Check
https://huggingface.co/docs/transformers/main/en/main_classes/quantization#offload-between-cpu-and-gpu
for more details.
Traceback (most recent call last):
File "/usr/local/bin/accelerate", line 8, in
Steps to reproduce
Try Axolotl example notebook. During training it fails with given error. It used to work last week perfectly. Tried Gemma 7b model too and it too is failing.
Config yaml
base_model: google/gemma-7b
#base_model: meta-llama/Meta-Llama-3-8B-Instruct
model_type: AutoModelForCausalLM #For Gemma
#model_type: LlamaForCausalLM
tokenizer_type: AutoTokenizer
load_in_8bit: false
load_in_4bit: true
strict: false
datasets:
- path: /content/test_txt_data-10exmpl.json
type: completion
field: text
#datasets:
# - path: ./mar_alpaca_dataset.json
# type: alpaca
# ds_type: json
dataset_prepared_path: /content
dataset_processes: 10
val_set_size: 0
output_dir: ./qlora-out
adapter: qlora
lora_model_dir:
sequence_len: 700
sample_packing: true
pad_to_sequence_len: true
lora_r: 16
lora_alpha: 32
lora_dropout: 0.05
lora_target_modules:
- q_proj
- v_proj
- k_proj
- o_proj
- gate_proj
- down_proj
- up_proj
#lora_modules_to_save:
#- embed_tokens
#- lm_head
lora_target_linear: true
lora_fan_in_fan_out:
gradient_accumulation_steps: 1
micro_batch_size: 1
num_epochs: 1
optimizer: paged_adamw_32bit
lr_scheduler: cosine
learning_rate: 0.0002
train_on_inputs: false
group_by_length: false
bf16: false
fp16: false
tf32: false
gradient_checkpointing: true
early_stopping_patience:
resume_from_checkpoint:
local_rank:
logging_steps: 1
xformers_attention:
flash_attention: False
warmup_ratio: 0.1
evals_per_epoch: 1
eval_table_size:
eval_max_new_tokens: 128
eval_sample_packing: False
saves_per_epoch: 1
debug:
deepspeed:
weight_decay: 0.0
fsdp:
fsdp_config:
special_tokens:
save_safetensors: True
gpu_memory_limit: 14
Possible solution
Fix the errors and dependancies.
Which Operating Systems are you using?
- [X] Linux
- [ ] macOS
- [ ] Windows
Python Version
3.10
axolotl branch-commit
latest
Acknowledgements
- [X] My issue title is concise, descriptive, and in title casing.
- [X] I have searched the existing issues to make sure this bug has not been reported yet.
- [X] I am using the latest version of axolotl.
- [X] I have provided enough information for the maintainers to reproduce and diagnose the issue.
I was having trouble but
root@62d88bdd9d38:/workspace/axolotl# export BNB_CUDA_VERSION=
root@62d88bdd9d38:/workspace/axolotl# accelerate launch -m axolotl.cli.train examples/openllama-3b/lora.yml
The following values were not passed to `accelerate launch` and had defaults used instead:
`--num_processes` was set to a value of `1`
`--num_machines` was set to a value of `1`
`--mixed_precision` was set to a value of `'no'`
`--dynamo_backend` was set to a value of `'no'`
To avoid this warning pass in values for each of the problematic parameters or run `accelerate config`.
[2024-04-29 13:28:37,017] [INFO] [real_accelerator.py:191:get_accelerator] Setting ds_accelerator to cuda (auto detect)
[2024-04-29 13:28:37,881] [WARNING] [axolotl.utils.config.models.input.hint_sample_packing_padding:686] [PID:343] [RANK:0] `pad_to_sequence_len: true` is recommended when using sample_packing
[2024-04-29 13:28:38,092] [INFO] [axolotl.normalize_config:182] [PID:343] [RANK:0] GPU memory usage baseline: 0.000GB (+0.600GB misc)
dP dP dP
88 88 88
.d8888b. dP. .dP .d8888b. 88 .d8888b. d8888P 88
88' `88 `8bd8' 88' `88 88 88' `88 88 88
88. .88 .d88b. 88. .88 88 88. .88 88 88
`88888P8 dP' `dP `88888P' dP `88888P' dP dP
****************************************
**** Axolotl Dependency Versions *****
accelerate: 0.28.0
peft: 0.10.0
transformers: 4.40.0.dev0
trl: 0.8.5
torch: 2.1.2+cu118
bitsandbytes: 0.43.0
****************************************
[2024-04-29 13:28:38,126] [WARNING] [axolotl.scripts.check_user_token:464] [PID:343] [RANK:0] Error verifying HuggingFace token. Remember to log in using `huggingface-cli login` and get your access token from https://huggingface.co/settings/tokens if you want to use gated models or datasets.
You are using the default legacy behaviour of the <class 'transformers.models.llama.tokenization_llama.LlamaTokenizer'>. This is expected, and simply means that the `legacy` (previous) behavior will be used so nothing changes for you. If you want to use the new behaviour, set `legacy=False`. This should only be set if you understand what it means, and thoroughly read the reason why this was added as explained in https://github.com/huggingface/transformers/pull/24565
[2024-04-29 13:28:38,456] [DEBUG] [axolotl.load_tokenizer:279] [PID:343] [RANK:0] EOS: 2 / </s>
[2024-04-29 13:28:38,457] [DEBUG] [axolotl.load_tokenizer:280] [PID:343] [RANK:0] BOS: 1 / <s>
[2024-04-29 13:28:38,457] [DEBUG] [axolotl.load_tokenizer:281] [PID:343] [RANK:0] PAD: 2 / </s>
[2024-04-29 13:28:38,457] [DEBUG] [axolotl.load_tokenizer:282] [PID:343] [RANK:0] UNK: 0 / <unk>
[2024-04-29 13:28:38,457] [INFO] [axolotl.load_tokenizer:293] [PID:343] [RANK:0] No Chat template selected. Consider adding a chat template for easier inference.
[2024-04-29 13:28:38,457] [INFO] [axolotl.load_tokenized_prepared_datasets:183] [PID:343] [RANK:0] Unable to find prepared dataset in last_run_prepared/8cc35674c453a287d7de953d7084a596
[2024-04-29 13:28:38,457] [INFO] [axolotl.load_tokenized_prepared_datasets:184] [PID:343] [RANK:0] Loading raw datasets...
[2024-04-29 13:28:38,457] [WARNING] [axolotl.load_tokenized_prepared_datasets:186] [PID:343] [RANK:0] Processing datasets during training can lead to VRAM instability. Please pre-process your dataset.
[2024-04-29 13:28:38,457] [INFO] [axolotl.load_tokenized_prepared_datasets:193] [PID:343] [RANK:0] No seed provided, using default seed of 42
Repo card metadata block was not found. Setting CardData to empty.
[2024-04-29 13:28:39,088] [WARNING] [huggingface_hub.repocard.content:107] [PID:343] Repo card metadata block was not found. Setting CardData to empty.
Repo card metadata block was not found. Setting CardData to empty.
[2024-04-29 13:28:43,084] [WARNING] [huggingface_hub.repocard.content:107] [PID:343] Repo card metadata block was not found. Setting CardData to empty.
[2024-04-29 13:28:45,399] [INFO] [axolotl.load_tokenized_prepared_datasets:410] [PID:343] [RANK:0] merging datasets
[2024-04-29 13:28:45,438] [INFO] [axolotl.load_tokenized_prepared_datasets:423] [PID:343] [RANK:0] Saving merged prepared dataset to disk... last_run_prepared/8cc35674c453a287d7de953d7084a596
Saving the dataset (1/1 shards): 100%|████████████████████| 54568/54568 [00:00<00:00, 91839.43 examples/s]
[2024-04-29 13:28:46,062] [DEBUG] [axolotl.log:61] [PID:343] [RANK:0] total_num_tokens: 182_913
[2024-04-29 13:28:46,070] [DEBUG] [axolotl.log:61] [PID:343] [RANK:0] `total_supervised_tokens: 38_104`
[2024-04-29 13:28:50,077] [INFO] [axolotl.utils.samplers.multipack._len_est:184] [PID:343] [RANK:0] packing_efficiency_estimate: 1.0 total_num_tokens per device: 182913
[2024-04-29 13:28:50,077] [DEBUG] [axolotl.log:61] [PID:343] [RANK:0] data_loader_len: 43
[2024-04-29 13:28:50,077] [INFO] [axolotl.log:61] [PID:343] [RANK:0] sample_packing_eff_est across ranks: [0.9501381732047872]
[2024-04-29 13:28:50,077] [DEBUG] [axolotl.log:61] [PID:343] [RANK:0] sample_packing_eff_est: None
[2024-04-29 13:28:50,078] [DEBUG] [axolotl.log:61] [PID:343] [RANK:0] total_num_steps: 172
[2024-04-29 13:28:50,126] [DEBUG] [axolotl.log:61] [PID:343] [RANK:0] total_num_tokens: 10_466_111
[2024-04-29 13:28:50,510] [DEBUG] [axolotl.log:61] [PID:343] [RANK:0] `total_supervised_tokens: 6_735_490`
[2024-04-29 13:28:50,546] [INFO] [axolotl.utils.samplers.multipack._len_est:184] [PID:343] [RANK:0] packing_efficiency_estimate: 1.0 total_num_tokens per device: 10466111
[2024-04-29 13:28:50,547] [DEBUG] [axolotl.log:61] [PID:343] [RANK:0] data_loader_len: 2529
[2024-04-29 13:28:50,547] [INFO] [axolotl.log:61] [PID:343] [RANK:0] sample_packing_eff_est across ranks: [0.9323856525668217]
[2024-04-29 13:28:50,547] [DEBUG] [axolotl.log:61] [PID:343] [RANK:0] sample_packing_eff_est: 0.94
[2024-04-29 13:28:50,547] [DEBUG] [axolotl.log:61] [PID:343] [RANK:0] total_num_steps: 10116
[2024-04-29 13:28:50,554] [DEBUG] [axolotl.train.log:61] [PID:343] [RANK:0] loading tokenizer... openlm-research/open_llama_3b_v2
[2024-04-29 13:28:50,852] [DEBUG] [axolotl.load_tokenizer:279] [PID:343] [RANK:0] EOS: 2 / </s>
[2024-04-29 13:28:50,852] [DEBUG] [axolotl.load_tokenizer:280] [PID:343] [RANK:0] BOS: 1 / <s>
[2024-04-29 13:28:50,852] [DEBUG] [axolotl.load_tokenizer:281] [PID:343] [RANK:0] PAD: 2 / </s>
[2024-04-29 13:28:50,852] [DEBUG] [axolotl.load_tokenizer:282] [PID:343] [RANK:0] UNK: 0 / <unk>
[2024-04-29 13:28:50,852] [INFO] [axolotl.load_tokenizer:293] [PID:343] [RANK:0] No Chat template selected. Consider adding a chat template for easier inference.
[2024-04-29 13:28:50,852] [DEBUG] [axolotl.train.log:61] [PID:343] [RANK:0] loading model and peft_config...
[2024-04-29 13:28:51,000] [INFO] [axolotl.load_model:359] [PID:343] [RANK:0] patching with flash attention for sample packing
[2024-04-29 13:28:51,001] [INFO] [axolotl.load_model:408] [PID:343] [RANK:0] patching _expand_mask
`low_cpu_mem_usage` was None, now set to True since model is quantized.
pytorch_model.bin: 100%|█████████████████████████████████████████████| 6.85G/6.85G [01:37<00:00, 70.1MB/s]
/root/miniconda3/envs/py3.10/lib/python3.10/site-packages/torch/_utils.py:831: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
return self.fget.__get__(instance, owner)()
generation_config.json: 100%|█████████████████████████████████████████████| 137/137 [00:00<00:00, 370kB/s]
[2024-04-29 13:30:31,716] [INFO] [axolotl.load_model:720] [PID:343] [RANK:0] GPU memory usage after model load: 3.430GB (+0.146GB cache, +0.569GB misc)
[2024-04-29 13:30:31,728] [INFO] [axolotl.load_model:771] [PID:343] [RANK:0] converting PEFT model w/ prepare_model_for_kbit_training
[2024-04-29 13:30:31,730] [INFO] [axolotl.load_model:780] [PID:343] [RANK:0] converting modules to torch.float16 for flash attention
trainable params: 12,712,960 || all params: 3,439,186,560 || trainable%: 0.36965020007521776
[2024-04-29 13:30:31,954] [INFO] [axolotl.load_model:825] [PID:343] [RANK:0] GPU memory usage after adapters: 3.478GB (+0.911GB cache, +0.569GB misc)
Using the `WANDB_DISABLED` environment variable is deprecated and will be removed in v5. Use the --report_to flag to control the integrations used for logging result (for instance --report_to none).
[2024-04-29 13:30:32,017] [INFO] [axolotl.train.log:61] [PID:343] [RANK:0] Pre-saving adapter config to ./lora-out
[2024-04-29 13:30:32,020] [INFO] [axolotl.train.log:61] [PID:343] [RANK:0] Starting trainer...
model.safetensors: 0%| | 0.00/6.85G [00:00<?, ?B/s][2024-04-29 13:30:32,189] [INFO] [axolotl.utils.samplers.multipack._len_est:184] [PID:343] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 10466111
[2024-04-29 13:30:32,226] [INFO] [axolotl.utils.samplers.multipack._len_est:184] [PID:343] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 10466111
[2024-04-29 13:30:32,318] [INFO] [axolotl.utils.samplers.multipack._len_est:184] [PID:343] [RANK:0] packing_efficiency_estimate: 0.94 total_num_tokens per device: 10466111
model.safetensors: 0%| | 10.5M/6.85G [00:00<03:41, 30.8MB/s]/root/miniconda3/envs/py3.10/lib/python3.10/site-packages/torch/utils/checkpoint.py:429: UserWarning: torch.utils.checkpoint: please pass in use_reentrant=True or use_reentrant=False explicitly. The default value of use_reentrant will be updated to be False in the future. To maintain current behavior, pass use_reentrant=True. It is recommended that you use use_reentrant=False. Refer to docs for more details on the differences between the two variants.
warnings.warn(
model.safetensors: 0%|▏ | 21.0M/6.85G [00:00<02:27, 46.3MB/s]Traceback (most recent call last):
File "/root/miniconda3/envs/py3.10/lib/python3.10/runpy.py", line 196, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/root/miniconda3/envs/py3.10/lib/python3.10/runpy.py", line 86, in _run_code
exec(code, run_globals)
File "/workspace/axolotl/src/axolotl/cli/train.py", line 59, in <module>
fire.Fire(do_cli)
File "/root/miniconda3/envs/py3.10/lib/python3.10/site-packages/fire/core.py", line 143, in Fire
component_trace = _Fire(component, args, parsed_flag_args, context, name)
File "/root/miniconda3/envs/py3.10/lib/python3.10/site-packages/fire/core.py", line 477, in _Fire
component, remaining_args = _CallAndUpdateTrace(
File "/root/miniconda3/envs/py3.10/lib/python3.10/site-packages/fire/core.py", line 693, in _CallAndUpdateTrace
component = fn(*varargs, **kwargs)
File "/workspace/axolotl/src/axolotl/cli/train.py", line 35, in do_cli
return do_train(parsed_cfg, parsed_cli_args)
File "/workspace/axolotl/src/axolotl/cli/train.py", line 55, in do_train
return train(cfg=cfg, cli_args=cli_args, dataset_meta=dataset_meta)
File "/workspace/axolotl/src/axolotl/train.py", line 163, in train
trainer.train(resume_from_checkpoint=resume_from_checkpoint)
File "/root/miniconda3/envs/py3.10/lib/python3.10/site-packages/transformers/trainer.py", line 1837, in train
return inner_training_loop(
File "/root/miniconda3/envs/py3.10/lib/python3.10/site-packages/transformers/trainer.py", line 2181, in _inner_training_loop
tr_loss_step = self.training_step(model, inputs)
File "/root/miniconda3/envs/py3.10/lib/python3.10/site-packages/transformers/trainer.py", line 3116, in training_step
loss = self.compute_loss(model, inputs)
File "/workspace/axolotl/src/axolotl/core/trainer_builder.py", line 492, in compute_loss
return super().compute_loss(model, inputs, return_outputs=return_outputs)
File "/root/miniconda3/envs/py3.10/lib/python3.10/site-packages/transformers/trainer.py", line 3139, in compute_loss
outputs = model(**inputs)
File "/root/miniconda3/envs/py3.10/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/root/miniconda3/envs/py3.10/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
return forward_call(*args, **kwargs)
File "/root/miniconda3/envs/py3.10/lib/python3.10/site-packages/accelerate/utils/operations.py", line 822, in forward
return model_forward(*args, **kwargs)
File "/root/miniconda3/envs/py3.10/lib/python3.10/site-packages/accelerate/utils/operations.py", line 810, in __call__
return convert_to_fp32(self.model_forward(*args, **kwargs))
File "/root/miniconda3/envs/py3.10/lib/python3.10/site-packages/torch/amp/autocast_mode.py", line 16, in decorate_autocast
return func(*args, **kwargs)
File "/root/miniconda3/envs/py3.10/lib/python3.10/site-packages/peft/peft_model.py", line 1129, in forward
return self.base_model(
File "/root/miniconda3/envs/py3.10/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/root/miniconda3/envs/py3.10/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
return forward_call(*args, **kwargs)
File "/root/miniconda3/envs/py3.10/lib/python3.10/site-packages/peft/tuners/tuners_utils.py", line 161, in forward
return self.model.forward(*args, **kwargs)
File "/root/miniconda3/envs/py3.10/lib/python3.10/site-packages/accelerate/hooks.py", line 166, in new_forward
output = module._old_forward(*args, **kwargs)
File "/root/miniconda3/envs/py3.10/lib/python3.10/site-packages/transformers/models/llama/modeling_llama.py", line 1189, in forward
outputs = self.model(
File "/root/miniconda3/envs/py3.10/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/root/miniconda3/envs/py3.10/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
return forward_call(*args, **kwargs)
File "/root/miniconda3/envs/py3.10/lib/python3.10/site-packages/accelerate/hooks.py", line 166, in new_forward
output = module._old_forward(*args, **kwargs)
File "/workspace/axolotl/src/axolotl/monkeypatch/llama_attn_hijack_flash.py", line 809, in llama_model_forward
layer_outputs = torch.utils.checkpoint.checkpoint(
File "/root/miniconda3/envs/py3.10/lib/python3.10/site-packages/torch/_compile.py", line 24, in inner
return torch._dynamo.disable(fn, recursive)(*args, **kwargs)
File "/root/miniconda3/envs/py3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 328, in _fn
return fn(*args, **kwargs)
File "/root/miniconda3/envs/py3.10/lib/python3.10/site-packages/torch/_dynamo/external_utils.py", line 17, in inner
return fn(*args, **kwargs)
File "/root/miniconda3/envs/py3.10/lib/python3.10/site-packages/torch/utils/checkpoint.py", line 451, in checkpoint
return CheckpointFunction.apply(function, preserve, *args)
File "/root/miniconda3/envs/py3.10/lib/python3.10/site-packages/torch/autograd/function.py", line 539, in apply
return super().apply(*args, **kwargs) # type: ignore[misc]
File "/root/miniconda3/envs/py3.10/lib/python3.10/site-packages/torch/utils/checkpoint.py", line 230, in forward
outputs = run_function(*args)
File "/workspace/axolotl/src/axolotl/monkeypatch/llama_attn_hijack_flash.py", line 803, in custom_forward
return module(
File "/root/miniconda3/envs/py3.10/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/root/miniconda3/envs/py3.10/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
return forward_call(*args, **kwargs)
File "/root/miniconda3/envs/py3.10/lib/python3.10/site-packages/accelerate/hooks.py", line 166, in new_forward
output = module._old_forward(*args, **kwargs)
File "/workspace/axolotl/src/axolotl/monkeypatch/llama_attn_hijack_flash.py", line 902, in forward
hidden_states, self_attn_weights, present_key_value = self.self_attn(
File "/root/miniconda3/envs/py3.10/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/root/miniconda3/envs/py3.10/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
return forward_call(*args, **kwargs)
File "/root/miniconda3/envs/py3.10/lib/python3.10/site-packages/accelerate/hooks.py", line 166, in new_forward
output = module._old_forward(*args, **kwargs)
File "/workspace/axolotl/src/axolotl/monkeypatch/llama_attn_hijack_flash.py", line 478, in flashattn_forward
output = flash_attn_varlen_qkvpacked_func(
File "/root/miniconda3/envs/py3.10/lib/python3.10/site-packages/flash_attn/flash_attn_interface.py", line 887, in flash_attn_varlen_qkvpacked_func
return FlashAttnVarlenQKVPackedFunc.apply(
File "/root/miniconda3/envs/py3.10/lib/python3.10/site-packages/torch/autograd/function.py", line 539, in apply
return super().apply(*args, **kwargs) # type: ignore[misc]
File "/root/miniconda3/envs/py3.10/lib/python3.10/site-packages/flash_attn/flash_attn_interface.py", line 288, in forward
out, q, k, v, out_padded, softmax_lse, S_dmask, rng_state = _flash_attn_varlen_forward(
File "/root/miniconda3/envs/py3.10/lib/python3.10/site-packages/flash_attn/flash_attn_interface.py", line 85, in _flash_attn_varlen_forward
out, q, k, v, out_padded, softmax_lse, S_dmask, rng_state = flash_attn_cuda.varlen_fwd(
RuntimeError: FlashAttention only supports Ampere GPUs or newer.
model.safetensors: 100%|█████████████████████████████████████████████| 6.85G/6.85G [01:52<00:00, 61.0MB/s]
0%| | 0/21524 [01:52<?, ?it/s]
Traceback (most recent call last):
File "/root/miniconda3/envs/py3.10/bin/accelerate", line 8, in <module>
sys.exit(main())
File "/root/miniconda3/envs/py3.10/lib/python3.10/site-packages/accelerate/commands/accelerate_cli.py", line 46, in main
args.func(args)
File "/root/miniconda3/envs/py3.10/lib/python3.10/site-packages/accelerate/commands/launch.py", line 1057, in launch_command
simple_launcher(args)
File "/root/miniconda3/envs/py3.10/lib/python3.10/site-packages/accelerate/commands/launch.py", line 673, in simple_launcher
raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd)
subprocess.CalledProcessError: Command '['/root/miniconda3/envs/py3.10/bin/python3', '-m', 'axolotl.cli.train', 'examples/openllama-3b/lora.yml']' returned non-zero exit status 1.
helped get me to the next problem...
Also apt-get install python-is-python3 helped fix another problem I was having
@MeDott29 , you may be having FA issues RuntimeError: FlashAttention only supports Ampere GPUs or newer.
@amitagh , did you try other example configs?
Running colab (T4 free version) example produces the following error:
File "/content/src/axolotl/src/axolotl/train.py", line 170, in train trainer.train(resume_from_checkpoint=resume_from_checkpoint) File "/usr/local/lib/python3.10/dist-packages/transformers/trainer.py", line 1859, in train return inner_training_loop( File "/usr/local/lib/python3.10/dist-packages/transformers/trainer.py", line 2249, in inner_training_loop grad_norm = self.accelerator.clip_grad_norm( File "/usr/local/lib/python3.10/dist-packages/accelerate/accelerator.py", line 2269, in clip_grad_norm self.unscale_gradients() File "/usr/local/lib/python3.10/dist-packages/accelerate/accelerator.py", line 2219, in unscale_gradients self.scaler.unscale_(opt) File "/usr/local/lib/python3.10/dist-packages/torch/cuda/amp/grad_scaler.py", line 307, in unscale_ optimizer_state["found_inf_per_device"] = self.unscale_grads( File "/usr/local/lib/python3.10/dist-packages/torch/cuda/amp/grad_scaler.py", line 248, in unscale_grads torch.amp_foreach_non_finite_check_and_unscale( RuntimeError: "_amp_foreach_non_finite_check_and_unscale_cuda" not implemented for 'BFloat16'
Speculation:
This might be due to T4 not supporting bfloat16, although disabling it produces the same error
I have met the exactly same error as jaydeepthik's while I was running Axolotl training on Colab notebook with L4 GPU.
File "/usr/local/lib/python3.10/dist-packages/torch/amp/grad_scaler.py", line 278, in _unscale_grads_
torch._amp_foreach_non_finite_check_and_unscale_(
RuntimeError: "_amp_foreach_non_finite_check_and_unscale_cuda" not implemented for 'BFloat16'
The error occurred in the following code cell.
!accelerate launch -m axolotl.cli.train /content/llama-3-8b-Instruct-bnb-4bit-qlora.yaml
Would you take a look?
@chdaesung @jaydeepthik , sorry, I missed the notifications. Do you have a sample Colab notebook? I was able to run successfully on Colab these past weeks.
Hi, we've had a major overhaul since then and Axolotl installation is now much easier. Here is an updated colab notebook: https://github.com/axolotl-ai-cloud/axolotl/blob/main/examples/colab-notebooks/colab-axolotl-example.ipynb
which details installation.