Megatron-DeepSpeed Parallelize Meg CUDA Kernel build system

Parallelize Meg CUDA Kernel build system

Open stas00 opened this issue 2 years ago • 0 comments

It takes forever to build the Meg cuda kernels as it does it sequentially and doesn't take advantage of multiple cores. It takes some 5 minutes to build. And every time one changes the number of gpus it rebuilds itself, which is both very non-productive and it also makes the CI really slow.

Need to rewrite the build to parallelize it.

Sidenotes: apex and deepspeed have this too, but deepspeed supports make -j

And ideally the solution needs to come from pytorch, perhaps if we solve it generically we could upstream the solution to pytorch core.

Oct 29 '21 15:10 stas00

Megatron-DeepSpeed Megatron-DeepSpeed copied to clipboard

Parallelize Meg CUDA Kernel build system

Megatron-DeepSpeed
Megatron-DeepSpeed copied to clipboard