xgboost icon indicating copy to clipboard operation
xgboost copied to clipboard

Tune cuda architectures

Open RAMitchell opened this issue 2 years ago • 0 comments

Cuda SASS code is compatible across major architectures, so we only need one from each, unless particular features are needed. As we don't use any special features (e.g. tensor cores) just one is needed.

Also fixes an issue where SASS code was not generated for the latest architecture (only PTX).

Binary size on my machine (without NCCL) goes from 182Mb->114Mb. Compile times will also be faster.

RAMitchell avatar Aug 10 '22 09:08 RAMitchell