axolotl multi gpu - transformers/modeling_utils.py - Trying to set a tensor of shape torch.Size([32000, 4096]) in "weight" (which has shape torch.Size([0])), this look incorrect.

multi gpu - transformers/modeling_utils.py - Trying to set a tensor of shape torch.Size([32000, 4096]) in "weight" (which has shape torch.Size([0])), this look incorrect.

Open manishiitg opened this issue 5 months ago • 3 comments

Please check that this issue hasn't been reported before.

[X] I searched previous Bug Reports didn't find any similar reports.

Expected Behavior

should work

Current behaviour

Traceback (most recent call last):
  File "/root/miniconda3/envs/py3.10/lib/python3.10/runpy.py", line 196, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/root/miniconda3/envs/py3.10/lib/python3.10/runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "/workspace/axolotl/src/axolotl/cli/train.py", line 59, in <module>
    fire.Fire(do_cli)
  File "/root/miniconda3/envs/py3.10/lib/python3.10/site-packages/fire/core.py", line 141, in Fire
    component_trace = _Fire(component, args, parsed_flag_args, context, name)
  File "/root/miniconda3/envs/py3.10/lib/python3.10/site-packages/fire/core.py", line 475, in _Fire
    component, remaining_args = _CallAndUpdateTrace(
  File "/root/miniconda3/envs/py3.10/lib/python3.10/site-packages/fire/core.py", line 691, in _CallAndUpdateTrace
    component = fn(*varargs, **kwargs)
  File "/workspace/axolotl/src/axolotl/cli/train.py", line 35, in do_cli
    return do_train(parsed_cfg, parsed_cli_args)
  File "/workspace/axolotl/src/axolotl/cli/train.py", line 55, in do_train
    return train(cfg=cfg, cli_args=cli_args, dataset_meta=dataset_meta)
  File "/workspace/axolotl/src/axolotl/train.py", line 80, in train
    model, peft_config = load_model(cfg, tokenizer, inference=cli_args.inference)
  File "/workspace/axolotl/src/axolotl/utils/models.py", line 624, in load_model
    raise err
  File "/workspace/axolotl/src/axolotl/utils/models.py", line 585, in load_model
    model = getattr(transformers, model_type).from_pretrained(
  File "/root/miniconda3/envs/py3.10/lib/python3.10/site-packages/transformers/modeling_utils.py", line 3504, in from_pretrained
    ) = cls._load_pretrained_model(
  File "/root/miniconda3/envs/py3.10/lib/python3.10/site-packages/transformers/modeling_utils.py", line 3924, in _load_pretrained_model
    new_error_msgs, offload_index, state_dict_index = _load_state_dict_into_meta_model(
  File "/root/miniconda3/envs/py3.10/lib/python3.10/site-packages/transformers/modeling_utils.py", line 805, in _load_state_dict_into_meta_model
    set_module_tensor_to_device(model, param_name, param_device, **set_module_kwargs)
  File "/root/miniconda3/envs/py3.10/lib/python3.10/site-packages/accelerate/utils/modeling.py", line 310, in set_module_tensor_to_device
    raise ValueError(
ValueError: Trying to set a tensor of shape torch.Size([32000, 4096]) in "weight" (which has shape torch.Size([0])), this look incorrect.
Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]
[2024-02-01 07:03:05,572] [ERROR] [axolotl.load_model:623] [PID:77] [RANK:0] Trying to set a tensor of shape torch.Size([32000, 4096]) in "weight" (which has shape torch.Size([0])), this look incorrect.
Traceback (most recent call last):
  File "/workspace/axolotl/src/axolotl/utils/models.py", line 585, in load_model
    model = getattr(transformers, model_type).from_pretrained(
  File "/root/miniconda3/envs/py3.10/lib/python3.10/site-packages/transformers/modeling_utils.py", line 3504, in from_pretrained
    ) = cls._load_pretrained_model(
  File "/root/miniconda3/envs/py3.10/lib/python3.10/site-packages/transformers/modeling_utils.py", line 3924, in _load_pretrained_model
    new_error_msgs, offload_index, state_dict_index = _load_state_dict_into_meta_model(
  File "/root/miniconda3/envs/py3.10/lib/python3.10/site-packages/transformers/modeling_utils.py", line 805, in _load_state_dict_into_meta_model
    set_module_tensor_to_device(model, param_name, param_device, **set_module_kwargs)
  File "/root/miniconda3/envs/py3.10/lib/python3.10/site-packages/accelerate/utils/modeling.py", line 310, in set_module_tensor_to_device
    raise ValueError(
ValueError: Trying to set a tensor of shape torch.Size([32000, 4096]) in "weight" (which has shape torch.Size([0])), this look incorrect.

Steps to reproduce

!docker run --gpus all
-v /root/.cache:/root/.cache
-v /home/gcpuser/sky_workdir:/sky_workdir
winglian/axolotl:main-py3.10-cu118-2.0.1
accelerate launch -m axolotl.cli.train /sky_workdir/hi-qlora-hi-2.yaml --deepspeed /sky_workdir/zero3_bf16.json

Config yaml

base_model: teknium/OpenHermes-2.5-Mistral-7B model_type: MistralForCausalLM tokenizer_type: LlamaTokenizer is_mistral_derived_model: true

load_in_8bit: false load_in_4bit: true strict: false

chat_template: chatml datasets:

path: manishiitg/chat-instruct-hi-v4 type: completion

hub_model_id: manishiitg/open-aditi-chat-hi-1.5 hf_use_auth_token: true

wandb_project: open-aditi-chat-hi-1.5

dataset_prepared_path: manishiitg push_dataset_to_hub: manishiitg val_set_size: 0 output_dir: /sky-notebook/manishiitg/open-aditi-chat-hi-1.5

adapter: qlora lora_model_dir: save_safetensors: true

sequence_len: 4096 sample_packing: true pad_to_sequence_len: true

lora_r: 16 lora_alpha: 32 lora_dropout: 0.05 lora_target_linear: true lora_fan_in_fan_out: lora_target_modules:

gate_proj
down_proj
up_proj
q_proj
v_proj
k_proj
o_proj

lora_modules_to_save:

embed_tokens
lm_head

wandb_entity: wandb_watch: wandb_run_id: wandb_log_model:

gradient_accumulation_steps: 4 micro_batch_size: 9 num_epochs: 3 optimizer: adamw_bnb_8bit lr_scheduler: cosine learning_rate: 0.0002

adam_beta2: 0.95 adam_epsilon: 0.00001 max_grad_norm: 1.0

train_on_inputs: false group_by_length: false bf16: true fp16: false tf32: false

gradient_checkpointing: true early_stopping_patience: resume_from_checkpoint: auto_resume_from_checkpoints: true ## manage check point resume from here local_rank: logging_steps: 1 xformers_attention: flash_attention: true

warmup_steps: 10 eval_steps: 0 eval_table_size: eval_table_max_new_tokens: 128 save_steps: 20 ## increase based on your dataset save_strategy: steps debug: deepspeed: weight_decay: 0.0 fsdp: fsdp_config: special_tokens: bos_token: "~~" eos_token: "~~" unk_token: "" tokens: # these are delimiters

"<|im_start|>"
"<|im_end|>"

Possible solution

No response

Which Operating Systems are you using?

[X] Linux
[ ] macOS
[ ] Windows

Python Version

3.10

axolotl branch-commit

main

Acknowledgements

[X] My issue title is concise, descriptive, and in title casing.
[X] I have searched the existing issues to make sure this bug has not been reported yet.
[X] I am using the latest version of axolotl.
[X] I have provided enough information for the maintainers to reproduce and diagnose the issue.

Feb 01 '24 07:02 manishiitg

axolotl axolotl copied to clipboard

multi gpu - transformers/modeling_utils.py - Trying to set a tensor of shape torch.Size([32000, 4096]) in "weight" (which has shape torch.Size([0])), this look incorrect.

Please check that this issue hasn't been reported before.

Expected Behavior

Current behaviour

Steps to reproduce

Config yaml

Possible solution

Which Operating Systems are you using?

Python Version

axolotl branch-commit

Acknowledgements

axolotl
axolotl copied to clipboard