piaohe20221128

Results 2 issues of piaohe20221128

hi,thankyou for release code! I have a question about the different pipline between train and inference 。the paper says that in inference stage the predict out of every decoder layer...

请问如果多分支内部有激活函数、门控等非线性操作的话也可以合并吗?提前感谢解答!