TensorComprehensions Cuda Performance Metrics/Profiling

Add CUPTI-based profiling functionality in CudaRTCFun.

There are several performance metrics (listed here. Each metric requires measuring (possibly multiple types of) hardware events. Since not all events can be measured at the same time, a kernel must be launched multiple times (it's 12 times with the current list of hardcoded metrics).

Mar 23 '18 14:03 thetheodor

On Tue, Apr 03, 2018 at 08:29:43AM -0700, Theodoros Theodoridis wrote:

ttheodor commented on this pull request.

target_link_libraries( tc_cuda
 ${CUDA_CUDA_LIBRARIES}
 ${CUDA_curand_LIBRARY}
 ${CUDA_LIBRARIES}
 ${CUDA_NVRTC_LIBRARIES}
${CUDA_cupti_LIBRARY}

find_package(cuda) should have set it. Is there a cupti directory under cuda_root_dir/extras in your machine?

No, I don't have that on any of my machines. Do I need to install something extra? Then please update the documentation in this commit.

skimo

Apr 03 '18 15:04 skimo-openhub

You shouldn't have to install it separately (I didn't).

The CUPTI library is supported on all platforms supported by the CUDA Toolkit, and is available on the CUDA Downloads page as part of the CUDA Tools SDK.

Apr 03 '18 16:04 thetheodor

On Tue, Apr 03, 2018 at 08:29:43AM -0700, Theodoros Theodoridis wrote:

ttheodor commented on this pull request.

target_link_libraries( tc_cuda
 ${CUDA_CUDA_LIBRARIES}
 ${CUDA_curand_LIBRARY}
 ${CUDA_LIBRARIES}
 ${CUDA_NVRTC_LIBRARIES}
${CUDA_cupti_LIBRARY}

find_package(cuda) should have set it. Is there a cupti directory under cuda_root_dir/extras in your machine?

Do you mean find_package(CUDA REQUIRED)? That's conditioned on WITH_CUDA and I'm building with -DWITH_CUDA=0. Where exactly does it set CUDA_cupti_LIBRARY and what is it looking for exactly?

skimo

Apr 03 '18 16:04 skimo-openhub

Yes, I meant find_package(CUDA REQUIRED). find_package(CUDA REQUIRED) will use FindCUDA.cmake which is shipped together with cmake. FindCUDA.cmake searches for all CUDA related libraries and sets the appropriate variables (such as CUDA_cupti_LIBRARY).

Apr 03 '18 16:04 thetheodor

On Tue, Apr 03, 2018 at 04:07:55PM +0000, Theodoros Theodoridis wrote:

You shouldn't have to install it separately (I didn't).

The CUPTI library is supported on all platforms supported by the CUDA Toolkit, and is available on the CUDA Downloads page as part of the CUDA Tools SDK.

So I should install this "CUDA Tools SDK"?

skimo

Apr 03 '18 17:04 skimo-openhub

CUDA Tools SDK is the standard thing you get when you install cuda with your package manager which is probably already installed in your system. By the way, what cmake version are you using?

Apr 03 '18 17:04 thetheodor

On Tue, Apr 03, 2018 at 05:18:59PM +0000, Theodoros Theodoridis wrote:

CUDA Tools SDK is the standard thing you get when you install cuda with your package manager which is probably already installed in your system.

Then it doesn't include the CUPTI thing. (I'm on Ubuntu 17.04)

By the way, what cmake version are you using?

cmake version 3.7.2

skimo

Apr 03 '18 18:04 skimo-openhub

@skimo-openhub if you are building with WITH_CUDA=0 ... ./build.sh (not -DWITH_CUDA=0, not sure if there is a difference) then you shouldn't see this. I suspect an issue in your build. I don't think CUPTI will install if you have no physical GPU even though the SDK itself will install fine for cross compilation.

Apr 04 '18 07:04 nicolasvasilache

On Wed, Apr 04, 2018 at 12:10:50AM -0700, Nicolas Vasilache wrote:

@skimo-openhub if you are building with WITH_CUDA=0 ... ./build.sh (not -DWITH_CUDA=0, not sure if there is a difference) then you shouldn't see this.

Thanks. That works on the machine without a card.

I suspect an issue in your build. I don't think CUPTI will install if you have no physical GPU even though the SDK itself will install fine for cross compilation.

I get the same error on the machine with a card. I can run test.sh just fine on this machine on master, but I can't configure it on cupti.

skimo

Apr 04 '18 07:04 skimo-openhub

On Tue, Apr 03, 2018 at 11:20:36AM -0700, skimo-openhub wrote:

On Tue, Apr 03, 2018 at 05:18:59PM +0000, Theodoros Theodoridis wrote:

CUDA Tools SDK is the standard thing you get when you install cuda with your package manager which is probably already installed in your system.

Then it doesn't include the CUPTI thing. (I'm on Ubuntu 17.04)

For the machine with a card, it's Ubuntu 17.10

By the way, what cmake version are you using?

cmake version 3.7.2

and cmake version 3.9.1

skimo

Apr 04 '18 07:04 skimo-openhub

@caffe2bot retest this please

Apr 04 '18 09:04 nicolasvasilache

On Tue, Apr 03, 2018 at 05:39:57PM +0200, Sven Verdoolaege wrote:

On Tue, Apr 03, 2018 at 08:29:43AM -0700, Theodoros Theodoridis wrote:
ttheodor commented on this pull request.

target_link_libraries( tc_cuda
 ${CUDA_CUDA_LIBRARIES}
 ${CUDA_curand_LIBRARY}
 ${CUDA_LIBRARIES}
 ${CUDA_NVRTC_LIBRARIES}
${CUDA_cupti_LIBRARY}

find_package(cuda) should have set it. Is there a cupti directory under cuda_root_dir/extras in your machine?
No, I don't have that on any of my machines. Do I need to install something extra?

Installing the package libcupti-dev solved the problem for me.

skimo

Apr 04 '18 09:04 skimo-openhub

On Wed, Apr 04, 2018 at 09:58:34AM +0000, Theodoros Theodoridis wrote:

ttheodor commented on this pull request.

size_t valueKindSize = sizeof(valueKind);
TC_CUPTI_CHECK(cuptiMetricGetAttribute(
 metric.id, CUPTI_METRIC_ATTR_VALUE_KIND, &valueKindSize, &valueKind));
return valueKind; +}
+double metricValueAsDouble(const CudaMetric& metric) {

auto valueKind = getValueKind(metric);

if (valueKind == CUPTI_METRIC_VALUE_KIND_DOUBLE) {

return metric.value.metricValueDouble;

} else if (valueKind == CUPTI_METRIC_VALUE_KIND_PERCENT) {

return metric.value.metricValuePercent;

} else {

CHECK(false) << "Invalid metric value conversion.";

} +}

The compiler can't figure out that CHECK(false) always leads to program termination. I could return dummy values to make the compiler happy.

Please do.

skimo

Apr 04 '18 10:04 skimo-openhub

On Wed, Apr 04, 2018 at 02:46:15PM +0000, Theodoros Theodoridis wrote:

ttheodor commented on this pull request.

@@ -116,7 +116,11 @@ void CUPTIAPI bufferCompleted( // since we launched only 1 kernel, we should have only 1 kernel record TC_CUPTI_CHECK(cuptiActivityGetNextRecord(buffer, validSize, &record));

+#if (CUPTI_API_VERSION >= 10)

The commit message refers to CUDA version. The check is on CUPTI version (their versions are tied but they use different numbers).

Thanks. I missed that.

skimo

Apr 04 '18 14:04 skimo-openhub

TensorComprehensions TensorComprehensions copied to clipboard

Cuda Performance Metrics/Profiling

TensorComprehensions
TensorComprehensions copied to clipboard