axolotl
axolotl copied to clipboard
Error invalid configuration argument at line 119 in file /src/csrc/ops.cu
Please check that this issue hasn't been reported before.
- [X] I searched previous Bug Reports didn't find any similar reports.
Expected Behavior
the train task should be start pertfectly
Current behaviour
Error invalid configuration argument at line 119 in file /src/csrc/ops.cu
[2024-04-17 00:08:54,225] [INFO] [axolotl.load_model:354] [PID:808742] [RANK:2] patching with flash attention for sample packing
[2024-04-17 00:08:54,225] [INFO] [axolotl.load_model:354] [PID:808744] [RANK:4] patching with flash attention for sample packing
[2024-04-17 00:08:54,230] [INFO] [axolotl.load_model:354] [PID:808743] [RANK:3] patching with flash attention for sample packing
[2024-04-17 00:08:54,246] [INFO] [axolotl.scripts.load_datasets:415] [PID:808746] [RANK:6] printing prompters...
[2024-04-17 00:08:54,249] [INFO] [axolotl.replace_llama_attn_with_flash_attn:133] [PID:808743] [RANK:3] optimized flash-attention RMSNorm not found (run `pip install 'git+https://github.com/Dao-AILab/flash-attention.git#egg=dropout_layer_norm&subdirectory=csrc/layer_norm'`)
[2024-04-17 00:08:54,249] [INFO] [axolotl.replace_llama_attn_with_flash_attn:133] [PID:808742] [RANK:2] optimized flash-attention RMSNorm not found (run `pip install 'git+https://github.com/Dao-AILab/flash-attention.git#egg=dropout_layer_norm&subdirectory=csrc/layer_norm'`)
[2024-04-17 00:08:54,249] [INFO] [axolotl.replace_llama_attn_with_flash_attn:133] [PID:808744] [RANK:4] optimized flash-attention RMSNorm not found (run `pip install 'git+https://github.com/Dao-AILab/flash-attention.git#egg=dropout_layer_norm&subdirectory=csrc/layer_norm'`)
[2024-04-17 00:08:54,249] [INFO] [axolotl.replace_llama_attn_with_flash_attn:133] [PID:808741] [RANK:1] optimized flash-attention RMSNorm not found (run `pip install 'git+https://github.com/Dao-AILab/flash-attention.git#egg=dropout_layer_norm&subdirectory=csrc/layer_norm'`)
[2024-04-17 00:08:54,249] [INFO] [axolotl.replace_llama_attn_with_flash_attn:133] [PID:808745] [RANK:5] optimized flash-attention RMSNorm not found (run `pip install 'git+https://github.com/Dao-AILab/flash-attention.git#egg=dropout_layer_norm&subdirectory=csrc/layer_norm'`)
[2024-04-17 00:08:54,249] [INFO] [axolotl.replace_llama_attn_with_flash_attn:133] [PID:808748] [RANK:7] optimized flash-attention RMSNorm not found (run `pip install 'git+https://github.com/Dao-AILab/flash-attention.git#egg=dropout_layer_norm&subdirectory=csrc/layer_norm'`)
[2024-04-17 00:08:54,249] [INFO] [axolotl.load_model:403] [PID:808742] [RANK:2] patching _expand_mask
[2024-04-17 00:08:54,249] [INFO] [axolotl.load_model:403] [PID:808743] [RANK:3] patching _expand_mask
[2024-04-17 00:08:54,249] [INFO] [axolotl.load_model:403] [PID:808745] [RANK:5] patching _expand_mask
[2024-04-17 00:08:54,249] [INFO] [axolotl.load_model:403] [PID:808744] [RANK:4] patching _expand_mask
[2024-04-17 00:08:54,249] [INFO] [axolotl.load_model:403] [PID:808741] [RANK:1] patching _expand_mask
[2024-04-17 00:08:54,249] [INFO] [axolotl.load_model:403] [PID:808748] [RANK:7] patching _expand_mask
[2024-04-17 00:08:54,287] [DEBUG] [axolotl.load_tokenizer:277] [PID:808746] [RANK:6] EOS: 2 / </s>
[2024-04-17 00:08:54,287] [DEBUG] [axolotl.load_tokenizer:278] [PID:808746] [RANK:6] BOS: 1 / <s>
[2024-04-17 00:08:54,287] [DEBUG] [axolotl.load_tokenizer:279] [PID:808746] [RANK:6] PAD: 2 / </s>
[2024-04-17 00:08:54,287] [DEBUG] [axolotl.load_tokenizer:280] [PID:808746] [RANK:6] UNK: 0 / <unk>
[2024-04-17 00:08:54,294] [INFO] [axolotl.load_model:354] [PID:808740] [RANK:0] patching with flash attention for sample packing
[2024-04-17 00:08:54,295] [INFO] [axolotl.replace_llama_attn_with_flash_attn:133] [PID:808740] [RANK:0] optimized flash-attention RMSNorm not found (run `pip install 'git+https://github.com/Dao-AILab/flash-attention.git#egg=dropout_layer_norm&subdirectory=csrc/layer_norm'`)
[2024-04-17 00:08:54,295] [INFO] [axolotl.load_model:403] [PID:808740] [RANK:0] patching _expand_mask
[2024-04-17 00:08:54,342] [INFO] [axolotl.load_model:354] [PID:808746] [RANK:6] patching with flash attention for sample packing
[2024-04-17 00:08:54,343] [INFO] [axolotl.replace_llama_attn_with_flash_attn:133] [PID:808746] [RANK:6] optimized flash-attention RMSNorm not found (run `pip install 'git+https://github.com/Dao-AILab/flash-attention.git#egg=dropout_layer_norm&subdirectory=csrc/layer_norm'`)
[2024-04-17 00:08:54,343] [INFO] [axolotl.load_model:403] [PID:808746] [RANK:6] patching _expand_mask
[2024-04-17 00:09:05,981] [INFO] [partition_parameters.py:349:__exit__] finished initializing model - num_params = 723, num_elems = 68.98B
Loading checkpoint shards: 100%|███████████████████████████████████████████████████████| 29/29 [00:32<00:00, 1.12s/it]
Loading checkpoint shards: 100%|███████████████████████████████████████████████████████| 29/29 [00:32<00:00, 1.12s/it]
Loading checkpoint shards: 100%|███████████████████████████████████████████████████████| 29/29 [00:32<00:00, 1.12s/it]
Loading checkpoint shards: 100%|███████████████████████████████████████████████████████| 29/29 [00:32<00:00, 1.12s/it]
[2024-04-17 00:09:38,492] [INFO] [axolotl.load_model:597] [PID:808743] [RANK:3] patching with SwiGLU
[2024-04-17 00:09:38,493] [INFO] [axolotl.load_model:597] [PID:808741] [RANK:1] patching with SwiGLU
[2024-04-17 00:09:38,495] [INFO] [axolotl.load_model:597] [PID:808744] [RANK:4] patching with SwiGLU
[2024-04-17 00:09:38,495] [INFO] [axolotl.load_model:597] [PID:808742] [RANK:2] patching with SwiGLU
Loading checkpoint shards: 100%|███████████████████████████████████████████████████████| 29/29 [00:32<00:00, 1.12s/it]
Loading checkpoint shards: 100%|███████████████████████████████████████████████████████| 29/29 [00:32<00:00, 1.12s/it]
[2024-04-17 00:09:38,511] [INFO] [axolotl.load_model:597] [PID:808745] [RANK:5] patching with SwiGLU
Loading checkpoint shards: 100%|███████████████████████████████████████████████████████| 29/29 [00:32<00:00, 1.12s/it]
[2024-04-17 00:09:38,513] [INFO] [axolotl.load_model:597] [PID:808748] [RANK:7] patching with SwiGLU
[2024-04-17 00:09:38,518] [INFO] [axolotl.load_model:597] [PID:808746] [RANK:6] patching with SwiGLU
Loading checkpoint shards: 100%|███████████████████████████████████████████████████████| 29/29 [00:32<00:00, 1.12s/it]
[2024-04-17 00:09:38,563] [INFO] [axolotl.load_model:597] [PID:808740] [RANK:0] patching with SwiGLU
[2024-04-17 00:14:54,032] [INFO] [axolotl.load_model:715] [PID:808741] [RANK:1] GPU memory usage after model load: 0.625GB (+1.723GB cache, +2.514GB misc)
[2024-04-17 00:14:54,036] [INFO] [axolotl.load_model:775] [PID:808741] [RANK:1] converting modules to torch.bfloat16 for flash attention
Using the `WANDB_DISABLED` environment variable is deprecated and will be removed in v5. Use the --report_to flag to control the integrations used for logging result (for instance --report_to none).
[2024-04-17 00:14:54,466] [INFO] [axolotl.utils.samplers.multipack._len_est:184] [PID:808741] [RANK:1] packing_efficiency_estimate: 0.9 total_num_tokens per device: 8223767
[2024-04-17 00:14:54,538] [INFO] [axolotl.utils.samplers.multipack._len_est:184] [PID:808741] [RANK:1] packing_efficiency_estimate: 0.9 total_num_tokens per device: 8223767
[2024-04-17 00:14:54,615] [INFO] [axolotl.utils.samplers.multipack._len_est:184] [PID:808741] [RANK:1] packing_efficiency_estimate: 0.9 total_num_tokens per device: 8223767
[2024-04-17 00:14:54,836] [INFO] [axolotl.load_model:715] [PID:808748] [RANK:7] GPU memory usage after model load: 0.625GB (+1.723GB cache, +2.373GB misc)
[2024-04-17 00:14:54,841] [INFO] [axolotl.load_model:775] [PID:808748] [RANK:7] converting modules to torch.bfloat16 for flash attention
[2024-04-17 00:14:54,870] [INFO] [axolotl.utils.samplers.multipack._len_est:184] [PID:808741] [RANK:1] packing_efficiency_estimate: 0.9 total_num_tokens per device: 8223767
Using the `WANDB_DISABLED` environment variable is deprecated and will be removed in v5. Use the --report_to flag to control the integrations used for logging result (for instance --report_to none).
[2024-04-17 00:14:55,271] [INFO] [axolotl.utils.samplers.multipack._len_est:184] [PID:808748] [RANK:7] packing_efficiency_estimate: 0.9 total_num_tokens per device: 8223767
[2024-04-17 00:14:55,342] [INFO] [axolotl.utils.samplers.multipack._len_est:184] [PID:808748] [RANK:7] packing_efficiency_estimate: 0.9 total_num_tokens per device: 8223767
[2024-04-17 00:14:55,414] [INFO] [axolotl.utils.samplers.multipack._len_est:184] [PID:808748] [RANK:7] packing_efficiency_estimate: 0.9 total_num_tokens per device: 8223767
[2024-04-17 00:14:55,650] [INFO] [axolotl.utils.samplers.multipack._len_est:184] [PID:808748] [RANK:7] packing_efficiency_estimate: 0.9 total_num_tokens per device: 8223767
[2024-04-17 00:14:55,666] [INFO] [axolotl.load_model:715] [PID:808745] [RANK:5] GPU memory usage after model load: 0.625GB (+1.723GB cache, +2.514GB misc)
[2024-04-17 00:14:55,670] [INFO] [axolotl.load_model:775] [PID:808745] [RANK:5] converting modules to torch.bfloat16 for flash attention
Using the `WANDB_DISABLED` environment variable is deprecated and will be removed in v5. Use the --report_to flag to control the integrations used for logging result (for instance --report_to none).
[2024-04-17 00:14:55,997] [INFO] [axolotl.load_model:715] [PID:808740] [RANK:0] GPU memory usage after model load: 0.625GB (+1.723GB cache, +3.498GB misc)
[2024-04-17 00:14:56,002] [INFO] [axolotl.load_model:775] [PID:808740] [RANK:0] converting modules to torch.bfloat16 for flash attention
[2024-04-17 00:14:56,051] [INFO] [axolotl.load_model:715] [PID:808744] [RANK:4] GPU memory usage after model load: 0.625GB (+1.723GB cache, +2.514GB misc)
[2024-04-17 00:14:56,055] [INFO] [axolotl.load_model:775] [PID:808744] [RANK:4] converting modules to torch.bfloat16 for flash attention
Using the `WANDB_DISABLED` environment variable is deprecated and will be removed in v5. Use the --report_to flag to control the integrations used for logging result (for instance --report_to none).
[2024-04-17 00:14:56,110] [WARNING] [accelerate.utils.other.log:61] [PID:808740] Detected kernel version 5.4.0, which is below the recommended minimum of 5.5.0; this can cause the process to hang. It is recommended to upgrade the kernel to the minimum version or higher.
[2024-04-17 00:14:56,122] [INFO] [axolotl.utils.samplers.multipack._len_est:184] [PID:808745] [RANK:5] packing_efficiency_estimate: 0.9 total_num_tokens per device: 8223767
[2024-04-17 00:14:56,143] [INFO] [axolotl.train.log:61] [PID:808740] [RANK:0] Starting trainer...
Using the `WANDB_DISABLED` environment variable is deprecated and will be removed in v5. Use the --report_to flag to control the integrations used for logging result (for instance --report_to none).
[2024-04-17 00:14:56,195] [INFO] [axolotl.utils.samplers.multipack._len_est:184] [PID:808745] [RANK:5] packing_efficiency_estimate: 0.9 total_num_tokens per device: 8223767
[2024-04-17 00:14:56,271] [INFO] [axolotl.utils.samplers.multipack._len_est:184] [PID:808745] [RANK:5] packing_efficiency_estimate: 0.9 total_num_tokens per device: 8223767
[2024-04-17 00:14:56,467] [INFO] [axolotl.utils.samplers.multipack._len_est:184] [PID:808740] [RANK:0] packing_efficiency_estimate: 0.9 total_num_tokens per device: 8223767
[2024-04-17 00:14:56,518] [INFO] [axolotl.utils.samplers.multipack._len_est:184] [PID:808744] [RANK:4] packing_efficiency_estimate: 0.9 total_num_tokens per device: 8223767
[2024-04-17 00:14:56,526] [INFO] [axolotl.utils.samplers.multipack._len_est:184] [PID:808745] [RANK:5] packing_efficiency_estimate: 0.9 total_num_tokens per device: 8223767
[2024-04-17 00:14:56,538] [INFO] [axolotl.load_model:715] [PID:808746] [RANK:6] GPU memory usage after model load: 0.625GB (+1.723GB cache, +2.514GB misc)
[2024-04-17 00:14:56,542] [INFO] [axolotl.utils.samplers.multipack._len_est:184] [PID:808740] [RANK:0] packing_efficiency_estimate: 0.9 total_num_tokens per device: 8223767
[2024-04-17 00:14:56,543] [INFO] [axolotl.load_model:775] [PID:808746] [RANK:6] converting modules to torch.bfloat16 for flash attention
[2024-04-17 00:14:56,599] [INFO] [axolotl.utils.samplers.multipack._len_est:184] [PID:808744] [RANK:4] packing_efficiency_estimate: 0.9 total_num_tokens per device: 8223767
[2024-04-17 00:14:56,617] [INFO] [axolotl.utils.samplers.multipack._len_est:184] [PID:808740] [RANK:0] packing_efficiency_estimate: 0.9 total_num_tokens per device: 8223767
Using the `WANDB_DISABLED` environment variable is deprecated and will be removed in v5. Use the --report_to flag to control the integrations used for logging result (for instance --report_to none).
[2024-04-17 00:14:56,676] [INFO] [axolotl.utils.samplers.multipack._len_est:184] [PID:808744] [RANK:4] packing_efficiency_estimate: 0.9 total_num_tokens per device: 8223767
[2024-04-17 00:14:56,704] [INFO] [axolotl.load_model:715] [PID:808742] [RANK:2] GPU memory usage after model load: 0.625GB (+1.723GB cache, +2.514GB misc)
[2024-04-17 00:14:56,709] [INFO] [axolotl.load_model:775] [PID:808742] [RANK:2] converting modules to torch.bfloat16 for flash attention
Using the `WANDB_DISABLED` environment variable is deprecated and will be removed in v5. Use the --report_to flag to control the integrations used for logging result (for instance --report_to none).
[2024-04-17 00:14:56,872] [INFO] [axolotl.utils.samplers.multipack._len_est:184] [PID:808740] [RANK:0] packing_efficiency_estimate: 0.9 total_num_tokens per device: 8223767
[2024-04-17 00:14:56,932] [INFO] [axolotl.utils.samplers.multipack._len_est:184] [PID:808744] [RANK:4] packing_efficiency_estimate: 0.9 total_num_tokens per device: 8223767
[2024-04-17 00:14:56,934] [WARNING] [engine.py:1179:_do_optimizer_sanity_check] **** You are using ZeRO with an untested optimizer, proceed with caution *****
[2024-04-17 00:14:56,975] [INFO] [axolotl.utils.samplers.multipack._len_est:184] [PID:808746] [RANK:6] packing_efficiency_estimate: 0.9 total_num_tokens per device: 8223767
[2024-04-17 00:14:57,051] [INFO] [axolotl.utils.samplers.multipack._len_est:184] [PID:808746] [RANK:6] packing_efficiency_estimate: 0.9 total_num_tokens per device: 8223767
[2024-04-17 00:14:57,123] [INFO] [axolotl.utils.samplers.multipack._len_est:184] [PID:808746] [RANK:6] packing_efficiency_estimate: 0.9 total_num_tokens per device: 8223767
[2024-04-17 00:14:57,164] [INFO] [axolotl.utils.samplers.multipack._len_est:184] [PID:808742] [RANK:2] packing_efficiency_estimate: 0.9 total_num_tokens per device: 8223767
[2024-04-17 00:14:57,240] [INFO] [axolotl.utils.samplers.multipack._len_est:184] [PID:808742] [RANK:2] packing_efficiency_estimate: 0.9 total_num_tokens per device: 8223767
[2024-04-17 00:14:57,315] [INFO] [axolotl.utils.samplers.multipack._len_est:184] [PID:808742] [RANK:2] packing_efficiency_estimate: 0.9 total_num_tokens per device: 8223767
Parameter Offload: Total persistent parameters: 1318912 in 321 params
[2024-04-17 00:14:57,363] [INFO] [axolotl.utils.samplers.multipack._len_est:184] [PID:808746] [RANK:6] packing_efficiency_estimate: 0.9 total_num_tokens per device: 8223767
[2024-04-17 00:14:57,570] [INFO] [axolotl.utils.samplers.multipack._len_est:184] [PID:808742] [RANK:2] packing_efficiency_estimate: 0.9 total_num_tokens per device: 8223767
[2024-04-17 00:14:58,305] [INFO] [axolotl.load_model:715] [PID:808743] [RANK:3] GPU memory usage after model load: 0.625GB (+1.723GB cache, +2.514GB misc)
[2024-04-17 00:14:58,310] [INFO] [axolotl.load_model:775] [PID:808743] [RANK:3] converting modules to torch.bfloat16 for flash attention
Using the `WANDB_DISABLED` environment variable is deprecated and will be removed in v5. Use the --report_to flag to control the integrations used for logging result (for instance --report_to none).
[2024-04-17 00:14:58,743] [INFO] [axolotl.utils.samplers.multipack._len_est:184] [PID:808743] [RANK:3] packing_efficiency_estimate: 0.9 total_num_tokens per device: 8223767
[2024-04-17 00:14:58,817] [INFO] [axolotl.utils.samplers.multipack._len_est:184] [PID:808743] [RANK:3] packing_efficiency_estimate: 0.9 total_num_tokens per device: 8223767
[2024-04-17 00:14:58,890] [INFO] [axolotl.utils.samplers.multipack._len_est:184] [PID:808743] [RANK:3] packing_efficiency_estimate: 0.9 total_num_tokens per device: 8223767
[2024-04-17 00:14:59,135] [INFO] [axolotl.utils.samplers.multipack._len_est:184] [PID:808743] [RANK:3] packing_efficiency_estimate: 0.9 total_num_tokens per device: 8223767
Error invalid configuration argument at line 119 in file /src/csrc/ops.cu
Error invalid configuration argument at line 119 in file /src/csrc/ops.cu
Error invalid configuration argument at line 119 in file /src/csrc/ops.cu
Error invalid configuration argument at line 119 in file /src/csrc/ops.cu
Error invalid configuration argument at line 119 in file /src/csrc/ops.cu
Error invalid configuration argument at line 119 in file /src/csrc/ops.cu
Error invalid configuration argument at line 119 in file /src/csrc/ops.cu
Error invalid configuration argument at line 119 in file /src/csrc/ops.cu
[2024-04-17 00:15:20,046] torch.distributed.elastic.multiprocessing.api: [ERROR] failed (exitcode: 1) local_rank: 0
Steps to reproduce
I trained the Codellama-70b model using 8 A100 80G GPUs. I performed a full fine-tune and used the following shell to start the training process:
accelerate launch -m axolotl.cli.train examples/code-llama/70b/fft_optimized.yml --debug
Config yaml
base_model: /mnt/models/CodeLlama-70b-hf
model_type: LlamaForCausalLM
tokenizer_type: LlamaTokenizer
load_in_8bit: false
load_in_4bit: false
strict: false
datasets:
- path: xxx
type:
field_instruction: instruction
field_output: response
dataset_prepared_path: last_run_prepared
val_set_size: 0.05
output_dir: /mnt/output
sequence_len: 4096
sample_packing: true
pad_to_sequence_len: true
chat_template: chatml
adapter:
lora_model_dir:
lora_r:
lora_alpha:
lora_dropout:
lora_target_linear:
lora_fan_in_fan_out:
wandb_project:
wandb_entity:
wandb_watch:
wandb_name:
wandb_log_model:
gradient_accumulation_steps: 1
micro_batch_size: 1
num_epochs: 1
optimizer: adamw_bnb_8bit
lr_scheduler: cosine
learning_rate: 0.00005
train_on_inputs: false
group_by_length: false
bf16: true
fp16:
tf32: false
gradient_checkpointing: true
early_stopping_patience:
resume_from_checkpoint:
local_rank:
logging_steps: 1
xformers_attention:
flash_attention: true
flash_attn_cross_entropy: false
flash_attn_rms_norm: true
flash_attn_fuse_qkv: false
flash_attn_fuse_mlp: true
warmup_steps: 200
evals_per_epoch: 4
eval_table_size:
saves_per_epoch: 1
debug:
deepspeed: deepspeed_configs/zero3_bf16_cpuoffload_params.json # multi-gpu only
weight_decay: 0.1
fsdp:
fsdp_config:
special_tokens:
Possible solution
No response
Which Operating Systems are you using?
- [X] Linux
- [ ] macOS
- [ ] Windows
Python Version
3.11.5
axolotl branch-commit
main/132eb740f036eff0fa8b239ddaf0b7a359ed1732
Acknowledgements
- [X] My issue title is concise, descriptive, and in title casing.
- [X] I have searched the existing issues to make sure this bug has not been reported yet.
- [X] I am using the latest version of axolotl.
- [X] I have provided enough information for the maintainers to reproduce and diagnose the issue.
Try pip install -U deepspeed.
This solved a similar problem with mistral 7b
@jaywongs , did the above solve it for you? I find this issue dependent on machine. It may also be bitsandbytes issue.
@jaywongs , did the above solve it for you? I find this issue dependent on machine. It may also be bitsandbytes issue.
Yes, it solved for me!
@jaywongs , did the above solve it for you? I find this issue dependent on machine. It may also be bitsandbytes issue.
Apologies for the delayed response. I have tried using the latest version of deepspeed, but the error persists.
@jaywongs , did upgrading deepspeed work for you?
@jaywongs , did upgrading deepspeed work for you?
not work for me,i use the deepspeed 0.14.2
@jaywongs , did upgrading deepspeed work for you?
not work for me,i use the deepspeed 0.14.2
Hello, have you solved it? I also encountered the same problem.
@jaywongs , did upgrading deepspeed work for you?
not work for me,i use the deepspeed 0.14.2
Hello, have you solved it? I also encountered the same problem.
Unfortunately, I was unable to solve it in the end.
Same error here.
Error invalid configuration argument at line 218 in file /src/csrc/ops.cu
I used winglian/axolotl:main-latest docker image and my configurations is shown below:
**** Axolotl Dependency Versions *****
accelerate: 0.33.0
peft: 0.12.0
transformers: 4.44.0
trl: 0.9.6
torch: 2.3.1+cu121
bitsandbytes: 0.43.3
****************************************
deepspeed: 0.15.0
Hey everyone, apologies for taking so long to circle back to this. Unfortunately, I could not reproduce this issue on runpod nodes. I used winglian/axolotl-cloud:main-latest on 2xa40 and did not meet this issue with qlora configs.
Are these all from local systems or from cloud systems? If the latter, have you tried provisioning another node? Secondly, does it only happen with certain configs (large models / small models , full ft / adapter)?