ColossalAI
ColossalAI copied to clipboard
[BUG]: fp32 param and grad have different shape torch.Size([5064704]) vs torch.Size([128000]) when use lora_rank=4 at stage 1
🐛 Describe the bug
Traceback (most recent call last):
File "train_sft.py", line 175, in
AssertionError: fp32 param and grad have different shape torch.Size([5073920]) vs torch.Size([16384])
optimizer.step()
File "/usr/local/lib/python3.8/dist-packages/torch/optim/lr_scheduler.py", line 65, in wrapper
return wrapped(*args, **kwargs)
File "/home/luban/.local/lib/python3.8/site-packages/colossalai/zero/sharded_optim/low_level_optim.py", line 467, in step
assert param_shape == flat_fp32_avg_grads.shape,
AssertionError: fp32 param and grad have different shape torch.Size([5064704]) vs torch.Size([128000])
Environment
No response
I am having the same issue here
i have a similar issue too
│ 153 │ def optimizer_step(self, optimizer: optim.Optimizer, **kwargs) -> None: │
│ ❱ 154 │ │ optimizer.step() │
│ 155 │ │
│ 156 │ @staticmethod │
│ 157 │ def _unwrap_actor(actor: Actor) -> nn.Module: │
│ │
│ /opt/conda/lib/python3.9/site-packages/torch/optim/lr_scheduler.py:65 in wrapper │
│ │
│ 62 │ │ │ │ instance = instance_ref() │
│ 63 │ │ │ │ instance._step_count += 1 │
│ 64 │ │ │ │ wrapped = func.__get__(instance, cls) │
│ ❱ 65 │ │ │ │ return wrapped(*args, **kwargs) │
│ 66 │ │ │ │
│ 67 │ │ │ # Note that the returned function here is no longer a bound method, │
│ 68 │ │ │ # so attributes like `__func__` and `__self__` no longer exist. │
│ │
│ /opt/conda/lib/python3.9/site-packages/colossalai/zero/sharded_optim/low_level_optim.py:467 in │
│ step │
│ │
│ 464 │ │ │ flat_fp32_avg_grads = flat_fp16_avg_grads.to(dtype) │
│ 465 │ │ │ │
│ 466 │ │ │ param_shape = self._fp32_flat_param_groups_of_current_rank[group_id].shape │
│ ❱ 467 │ │ │ assert param_shape == flat_fp32_avg_grads.shape, \ │
│ 468 │ │ │ │ f'fp32 param and grad have different shape {param_shape} vs {flat_fp32_a │
│ 469 │ │ │ │
│ 470 │ │ │ single_grad_partition_groups.append(flat_fp32_avg_grads) │
╰──────────────────────────────────────────────────────────────────────────────────────────────────╯
AssertionError: fp32 param and grad have different shape torch.Size([10138624]) vs torch.Size([144384]
I am having the same issue here
I am having the same issue here
the same issue too, how do you solve this issue ?
the same issue too
the same issue too
This error occurs when I use both --lora_rank and --grad_checkpoint. Either use --lora_rank or --grad_checkpoint.
the same issue too
the same issue too
the same issue too
AssertionError: fp32 param and grad have different shape I have solved this error. I use GLM-10B to train reward model. The outputs of 'mems' is used as last_hidden_states. But the 'mems' is processed by detach which means it is removed in the computation graph. And the gradients can not penetrate to the model. Therefore, you should verify your last_hidden_states and ensure its presence in the computational graph.