pytorch
pytorch copied to clipboard
Parallel build by splitting template instantiation into multiple files.
This reduces the wall clock build time from 19 mins to 12 mins on a Dual Epyc 7542 server
@jataylo We need to check this PR on CUDA to see if it provides similar gains. If so, we should file an upstream PR for this.
jenkins retest this please