ColossalAI [FEATURE]: set ComputePattern for op rather than parameter

[FEATURE]: set ComputePattern for op rather than parameter

Open ver217 opened this issue 2 years ago • 0 comments

Describe the feature

We set spec on parameter now, which means each paramter has its own unchanged compute_pattern. However, some models, like GPT-2, share parameter among different layers. GPT-2 shares token embedding weight among embedding layer and classifier layer, and it calls torch.nn.functional.embedding in embedding layer and calls torch.nn.functional.linear in classifier layer. One parameter may have two different compute pattern during training. Therefore, we should set compute pattern for operations rather than parameters.

May 10 '22 07:05 ver217

ColossalAI ColossalAI copied to clipboard

[FEATURE]: set ComputePattern for op rather than parameter

Describe the feature

ColossalAI
ColossalAI copied to clipboard