xgboost
xgboost copied to clipboard
Tune cuda architectures
Cuda SASS code is compatible across major architectures, so we only need one from each, unless particular features are needed. As we don't use any special features (e.g. tensor cores) just one is needed.
Also fixes an issue where SASS code was not generated for the latest architecture (only PTX).
Binary size on my machine (without NCCL) goes from 182Mb->114Mb. Compile times will also be faster.