builder PyTorch 1.8.1 on conda is 1.27GB

Considering that on conda, we are linking with the conda-provided toolkit and the conda-provided MKL, it seems like there's a bug somewhere that our pytorch binaries on conda are so large. please check.

Apr 02 '21 17:04 soumith

cc @malfet and @seemethere

Apr 05 '21 20:04 orionr

Update: we link statically with cudnn while shipping to conda, because neither https://anaconda.org/anaconda/cudnn nor https://anaconda.org/nvidia/cudnn has versions we depends on. (And CuDNN for 11.1 is much bigger than the one for 10.2) Quick CUBIN sizes comparison:

$ ~/git/torch-builder/analytics/cubinsizes.py unp-10.2/lib/python3.7/site-packages/torch/lib/libtorch_cuda.so 
Analyzing unp-10.2/lib/python3.7/site-packages/torch/lib/libtorch_cuda.so
.nv_fatbin size 986.6MiB
  ptx_37: 189.6MiB
  sm_37: 72.5MiB
  sm_50: 139.9MiB
  sm_60: 147.6MiB
  sm_61: 137.5MiB
  sm_70: 151.0MiB
  sm_75: 134.0MiB
  sm_35: 14.5MiB
__nv_relfatbin size 395.6KiB
  ptx_37: 43.2KiB
  sm_37: 54.5KiB
  sm_50: 59.2KiB
  sm_60: 59.5KiB
  sm_61: 59.5KiB
  sm_70: 60.0KiB
  sm_75: 59.6KiB
$ ~/git/torch-builder/analytics/cubinsizes.py unp-11.1/lib/python3.7/site-packages/torch/lib/libtorch_cuda_cu.so 
Analyzing unp-11.1/lib/python3.7/site-packages/torch/lib/libtorch_cuda_cu.so
.nv_fatbin size 1.2GiB
  ptx_37: 234.9MiB
  sm_37: 87.2MiB
  sm_50: 146.9MiB
  sm_60: 148.7MiB
  sm_61: 132.6MiB
  sm_70: 112.7MiB
  sm_75: 96.9MiB
  sm_80: 111.8MiB
  sm_86: 110.5MiB
__nv_relfatbin size 0.0B
$ ~/git/torch-builder/analytics/cubinsizes.py unp-11.1/lib/python3.7/site-packages/torch/lib/libtorch_cuda_cpp.so 
Analyzing unp-11.1/lib/python3.7/site-packages/torch/lib/libtorch_cuda_cpp.so
.nv_fatbin size 663.0MiB
  ptx_37: 4.8MiB
  sm_37: 9.2MiB
  sm_50: 45.1MiB
  sm_60: 54.0MiB
  sm_61: 54.6MiB
  sm_70: 82.8MiB
  sm_75: 75.7MiB
  sm_80: 96.0MiB
  sm_86: 95.8MiB
  sm_35: 20.4MiB
  ptx_70: 124.7MiB
__nv_relfatbin size 576.4KiB
  ptx_37: 55.2KiB
  sm_37: 58.2KiB
  sm_50: 64.9KiB
  sm_60: 65.7KiB
  sm_61: 65.7KiB
  sm_70: 67.1KiB
  sm_75: 66.6KiB
  sm_80: 66.5KiB
  sm_86: 66.5KiB

Apr 05 '21 21:04 malfet

in that case, if we are linking to system CuDNN, it has to be pruned first I guess.

Apr 05 '21 21:04 soumith

@soumith we can prune CuDNN for 11.1, as it results in unusable library, see following comment, which reproduces the problem with CuBLAS, but CuDNN is similarly affected: https://github.com/pytorch/pytorch/issues/53336#issuecomment-791849506

Apr 05 '21 21:04 malfet

cudnn in conda-forge is up-to-date and is currently being maintained by NVIDIA: https://anaconda.org/conda-forge/cudnn. As is cudatoolkit: https://github.com/conda-forge/cudatoolkit-feedstock. As a result, the conda-forge CUDA 11.2 packages for PyTorch are only 630 MB: https://anaconda.org/conda-forge/pytorch/files.

Given how much better conda-forge is maintained than defaults, and that it has significantly more users by now (I estimate 10x more, based on Python and NumPy download numbers), I think it's time to switch to relying on conda-forge.

May 28 '21 18:05 rgommers