PyTorch 1.8.1 on conda is 1.27GB
Considering that on conda, we are linking with the conda-provided toolkit and the conda-provided MKL, it seems like there's a bug somewhere that our pytorch binaries on conda are so large. please check.
cc @malfet and @seemethere
Update: we link statically with cudnn while shipping to conda, because neither https://anaconda.org/anaconda/cudnn nor https://anaconda.org/nvidia/cudnn has versions we depends on. (And CuDNN for 11.1 is much bigger than the one for 10.2) Quick CUBIN sizes comparison:
$ ~/git/torch-builder/analytics/cubinsizes.py unp-10.2/lib/python3.7/site-packages/torch/lib/libtorch_cuda.so
Analyzing unp-10.2/lib/python3.7/site-packages/torch/lib/libtorch_cuda.so
.nv_fatbin size 986.6MiB
ptx_37: 189.6MiB
sm_37: 72.5MiB
sm_50: 139.9MiB
sm_60: 147.6MiB
sm_61: 137.5MiB
sm_70: 151.0MiB
sm_75: 134.0MiB
sm_35: 14.5MiB
__nv_relfatbin size 395.6KiB
ptx_37: 43.2KiB
sm_37: 54.5KiB
sm_50: 59.2KiB
sm_60: 59.5KiB
sm_61: 59.5KiB
sm_70: 60.0KiB
sm_75: 59.6KiB
$ ~/git/torch-builder/analytics/cubinsizes.py unp-11.1/lib/python3.7/site-packages/torch/lib/libtorch_cuda_cu.so
Analyzing unp-11.1/lib/python3.7/site-packages/torch/lib/libtorch_cuda_cu.so
.nv_fatbin size 1.2GiB
ptx_37: 234.9MiB
sm_37: 87.2MiB
sm_50: 146.9MiB
sm_60: 148.7MiB
sm_61: 132.6MiB
sm_70: 112.7MiB
sm_75: 96.9MiB
sm_80: 111.8MiB
sm_86: 110.5MiB
__nv_relfatbin size 0.0B
$ ~/git/torch-builder/analytics/cubinsizes.py unp-11.1/lib/python3.7/site-packages/torch/lib/libtorch_cuda_cpp.so
Analyzing unp-11.1/lib/python3.7/site-packages/torch/lib/libtorch_cuda_cpp.so
.nv_fatbin size 663.0MiB
ptx_37: 4.8MiB
sm_37: 9.2MiB
sm_50: 45.1MiB
sm_60: 54.0MiB
sm_61: 54.6MiB
sm_70: 82.8MiB
sm_75: 75.7MiB
sm_80: 96.0MiB
sm_86: 95.8MiB
sm_35: 20.4MiB
ptx_70: 124.7MiB
__nv_relfatbin size 576.4KiB
ptx_37: 55.2KiB
sm_37: 58.2KiB
sm_50: 64.9KiB
sm_60: 65.7KiB
sm_61: 65.7KiB
sm_70: 67.1KiB
sm_75: 66.6KiB
sm_80: 66.5KiB
sm_86: 66.5KiB
in that case, if we are linking to system CuDNN, it has to be pruned first I guess.
@soumith we can prune CuDNN for 11.1, as it results in unusable library, see following comment, which reproduces the problem with CuBLAS, but CuDNN is similarly affected: https://github.com/pytorch/pytorch/issues/53336#issuecomment-791849506
cudnn in conda-forge is up-to-date and is currently being maintained by NVIDIA: https://anaconda.org/conda-forge/cudnn. As is cudatoolkit: https://github.com/conda-forge/cudatoolkit-feedstock. As a result, the conda-forge CUDA 11.2 packages for PyTorch are only 630 MB: https://anaconda.org/conda-forge/pytorch/files.
Given how much better conda-forge is maintained than defaults, and that it has significantly more users by now (I estimate 10x more, based on Python and NumPy download numbers), I think it's time to switch to relying on conda-forge.