cudnn-frontend Why is graph::check

I am using the cudnn_frontend to perform a simple matmul - all works as expected when using FLOAT data type, but the graph::check_support is really slow. I there any way to speed this up?

Also, the check_support fails when using a DOUBLE data type - is this expected?

Thanks.

Mar 22 '24 15:03 ZoroDerVonCodier

cudnn support surface is documented here: Developer Guide

DOUBLE is not supported by any of the operations. So check_support works as expected.

Mar 22 '24 17:03 vedaanta

Engines involving runtime compilation might cause latency.

check_support() returns execution plans from many different engines. You can look at various graph engines used by cudnn in the Developer Guide.

Refer to this Graph caching sample to save check_support() penalty during multiple execution.

What operation graph are you looking to target eventually with cudnn_frontend?
Are you using the Python or C++ API?

Mar 22 '24 17:03 vedaanta

Makes sense - I will check out the caching example. I am using C++. I presume the slowdown is caused by a runtime compile of the kernel under then hood. Regarding the caching - is the graph dependent to the shapes of the tensors specified in each op added. For example, when using the matmul op a tensor A and tensor B is passed to the operation. I get that the dimensions must match up for the MatMul op, but is the graph built by the create_execution_plans and build_plans dependent on all future tensors being the same shape as the original A and B tensors?

Mar 24 '24 18:03 ZoroDerVonCodier

Hi @ZoroDerVonCodier,

Graph is a function of the internal nodes (vertices) and tensors (edge) properties. And hence, you are correct to assume the graph is dependent on the shapes of the tensor A and B.

For your question, yes we will need to build two graphs corresponding if the tensor shape changes. However, note that the graph can be cached ) Essentially, build a cache which looks like:

fn(Matmul, TensorA0, TensorB0) -> Graph0
fn(Matmul, TensorA1, TensorB1) -> Graph1
fn(Matmul, TensorA2, TensorB2) -> Graph2

After the initial iteration, the cached graph can be used with different data pointers as long as the shape and size are same.

Hope it helps.

Apr 02 '24 05:04 Anerudhan

Hi,

With cudnn frontend >1.5.2 with cudnn backend >9.2.0, the check_support() should be much faster (100x+) improvement.

Kindly, let us know if you have more questions

Thanks

Aug 09 '24 21:08 Anerudhan

cudnn-frontend
cudnn-frontend copied to clipboard

Why is graph::check_support really slow?

cudnn-frontend cudnn-frontend copied to clipboard

Why is graph::check_support really slow?

cudnn-frontend
cudnn-frontend copied to clipboard