cub icon indicating copy to clipboard operation
cub copied to clipboard

Allow RDC builds without CDP support

Open gevtushenko opened this issue 3 years ago • 2 comments

The following code leads to excessive memory footprint when compiled with -rdc=true:

#include <iostream>
#include <cub/device/device_segmented_sort.cuh>

template class cub::DispatchSegmentedSort<false, int, int, int,int*, int*>;

int main() {
    size_t free_byte{}, total_byte{};
    double used{};

    cudaMemGetInfo(&free_byte, &total_byte);

    used = static_cast<double>(total_byte - free_byte) / 1024.0 / 1024.0;
    std::cout << "CUB " << CUB_VERSION << ": " << used << " MB" << std::endl;

    return 0;
}
:nvcc -O3 -rdc=true testSort.cu
:./a.out
CUB 200000: 1410.75 MB
:nvcc -O3 testSort.cu
:./a.out
CUB 200000: 616.75 MB

The issue can be reduced to the following code:

#include <iostream>

__global__ void continuation() { }

__global__ void kernel()
{
  continuation<<<1, 256>>>();
  __cudaDeviceSynchronizeDeprecationAvoidance();
}

int main() {
  size_t free_byte{}, total_byte{};
  double used{};

  cudaMemGetInfo(&free_byte, &total_byte);

  used = static_cast<double>(total_byte - free_byte) / 1024.0 / 1024.0;
  std::cout << used << " MB" << std::endl;
}

The __cudaDeviceSynchronizeDeprecationAvoidance is used on device only when debug_synchronous is set. I suggest we restrict "synchronous" part if the option in the case of device-side usage.

The second issue here is that we take -rdc=true as a desire to launch device-scope algorithms from device. Although -rdc=true is a requirement for device-side launches, it doesn't necessarily used for that purpose. The option might indicate that device code is distributed in a library, or that some device code is compiled in a different TU. I suggest we take a closer look at the overheads associated with device-side launches support and in case ones are found, separate rdc concept from device-side launch support.

gevtushenko avatar Jul 02 '22 15:07 gevtushenko

Agreed, this does sound like a reasonable option. We'd need to default to using the RDC state, but provide an opt-out for CDP support.

alliepiper avatar Jul 25 '22 21:07 alliepiper

Bumping priority after taking a closer look at this:

:nvcc -O3 -rdc=true testSort.cu
:./a.out
CUB 200000: 1410.75 MB
:nvcc -O3 testSort.cu
:./a.out
CUB 200000: 616.75 MB

😬

alliepiper avatar Jul 25 '22 21:07 alliepiper