Results 3 comments of Karl.LM

> Combining ZeRO2 with PP is not mechanistically efficient. ZeRO2 has to split the gradients, but PP has to accumulate the gradients, so there's no real performance boost, it's actually...

还有 1. 改完之后改一下 test_matmul.py并测试一下 2. 我没明白为什么不能把input作为外界输入,在解释一下吧

1. Symbol这个名字很怪异,我理解这个PR的目的应该是序列化/持久化OpNode,建议换个名字,就叫OpNodeProto 2. 不要放到job.proto里面,另起一个新文件吧