On the parallel setting
I find the codes in arguments.py file below " if args.model_parallel_size <= 2: set_context_parallel_group(args.model_parallel_size, mpu.get_model_parallel_group()) else: initialize_context_parallel(2) "
My question is
(1) are the two concepts "model_parallel" and "context_parallel" the same in your code? If they are, maybe it's better to use the same name, like "set_context_parallel_group(args.context_parallel_size, mpu.get_context_parallel_group())" since model parallel is more ambiguous in regards of the sharding dimension.
(2) The logic of the code snippet above is not clear, do you intend to restrict the maximum of model_parallel_size to be 2? what not to allow larger context parallel group such as 4 or 8 to enable longer generation
您好呀,感谢cogvideo团队公开这个项目。我们试用了项目中的模型,结果很赞。
我们发现在您的代码中,变量命名上似乎是 将 model_parallel 和 context_parallel 等价了? 如果是这样的话,感觉在config中将key改为 context_parallel 会更合适? 因为 model_parallel 的实现有很多中,而context_parallel 可以避免歧义。
另外,deepspeed-ulysses 和 megatron 的实现是不同的,请问您的团队的并行封装采用的是前者吗? https://github.com/microsoft/DeepSpeed/blob/master/blogs/deepspeed-ulysses/README.md https://docs.nvidia.com/megatron-core/developer-guide/latest/api-guide/context_parallel.html
最后,关于这段代码,我不太理解其中的逻辑。 " if args.model_parallel_size <= 2: set_context_parallel_group(args.model_parallel_size, mpu.get_model_parallel_group()) else: initialize_context_parallel(2) " 请问这是有意将context parallel 数值限制为<=2 吗?
This part means the model parallel in transformers and the context parallel in VAE use the same communication group. The current open-source code transformer part does not support context parallel.