Jiarui Fang（方佳瑞） comments

Results 220 comments of


                                            Jiarui Fang（方佳瑞）

[FEATURE]: Parallel layernorm optimization

A similar method has been proposed in [TurboTransformer Paper](https://dl.acm.org/doi/10.1145/3437801.3441578) to reduce sync times in cuda programming...

Need a finetuning example

I believe the feature depends on #256

Need a finetuning example

I believe the feature depends on #256

Need an integrated configuration tutorial for users to customize their applications

Did we have the doc?

[BUG]: colossalai check error

@wohaocaiji @zixiliuUSC I think the import error has been fixed in the latest main branch.

How PP and ZeRO stage 2+ work together?

> I also don't think it makes sense for Colossal AI to use the name of `ShardedModel`. Because for ZeRO1 and 2 we don't actually split the model. This name...

Any checkpoint saving/loading tutorial provided?

We are planning to provide this feature this week (28th May).

[Discussion] About 3D Parallelism

I agree 3D parallel can shrink the peak activation footprint in one GPU at cost of more communication. The method definitely works in some special cases. Maybe a simple searching...

[Discussion] About 3D Parallelism

@1SAA communication profiling results may support some of my assumption iin discussion.

[BUG]: ZeRO causes runtime error when use GRU and pack sequence

I think ZeRO does not support pack_padded_sequence right now. Since RNN usually does not have too many parameters. Since DP is often enough for RNNs, we do not test RNN...