Parallel residual Transformer layer

Open cavdard opened this issue 2 years ago • 0 comments

Hi,

Is there a way to enable parallel residual similar to HF GPT-neox use_parallel_residual config to speed up training?

@ksivaman If currently not supported do you have any plans to support?

Sep 19 '23 20:09 cavdard