ColossalAI
ColossalAI copied to clipboard
[BUG]: shardformer: pipeline forward error with customized layer distribution
π Describe the bug
Hi, I am trying to implement a custom shard policy with different layer distribution, but it seems all built-in policies have the following inconsistent implementation:
In get_held_layers()
, a policy uses self.distribute_layers()
and self.get_stage_index()
, which are customizable:
https://github.com/hpcaitech/ColossalAI/blob/79718fae04fc4461a35ae80ab87f52b64260f394/colossalai/shardformer/policies/gpt2.py#L170-L175
But in set_pipeline_forward()
, the policy uses Policy.distribute_layers()
and Policy.get_stage_index()
:
https://github.com/hpcaitech/ColossalAI/blob/79718fae04fc4461a35ae80ab87f52b64260f394/colossalai/shardformer/policies/gpt2.py#L192-L193
which will raise an error during pipeline forward due to layer inconsistency if the functions are overridden.
How to reproduce
I tested with examples/language/gpt/hybridparallelism/finetune.py
.
For hybrid_parallel
plugin, add a custom policy:
elif args.plugin == "hybrid_parallel":
BATCH_SIZE = 128
from colossalai.shardformer.policies.base_policy import Policy
from colossalai.shardformer.policies.gpt2 import GPT2ForSequenceClassificationPolicy
class CustomGPT2Policy(GPT2ForSequenceClassificationPolicy):
@staticmethod
def distribute_layers(num_layers: int, num_stages: int) -> List[int]:
layers_per_stage = Policy.distribute_layers(num_layers - 4, num_stages)
layers_per_stage[0] += 4
return layers_per_stage
plugin = HybridParallelPlugin(
tp_size=1,
pp_size=4,
num_microbatches=None,
microbatch_size=8,
zero_stage=0,
precision="fp16",
initial_scale=1,
custom_policy=CustomGPT2Policy(),
)
which distributes layers in a slightly different way: first stage has 4 more layers.
This leads the following error:
...
File "/usr/local/lib/python3.10/dist-packages/transformers/models/gpt2/modeling_gpt2.py", line 312, in forward
query, key, value = self.c_attn(hidden_states).split(self.split_size, dim=2)
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1527, in _call_impl
return forward_call(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/transformers/pytorch_utils.py", line 107, in forward
x = torch.addmm(self.bias, x.view(-1, x.size(-1)), self.weight)
TypeError: addmm(): argument 'input' (position 1) must be Tensor, not NoneType
Environment
torch 2.1.0 + cu118
Thanks for reporting. Would you like to submit a PR to solve this issue :)
Submitted!
Bot detected the issue body's language is not English, translate it automatically. π―ππ»π§βπ€βπ§π«π§πΏβπ€βπ§π»π©πΎβπ€βπ¨πΏπ¬πΏ
Submitted!
Sorry for the delayed update, since I was assigned to another task for the last several months, and this issue is finally resolved.