cudnn-frontend icon indicating copy to clipboard operation
cudnn-frontend copied to clipboard

cudnn_frontend provides a c++ wrapper for the cudnn backend API and samples on how to use it

Results 22 cudnn-frontend issues
Sort by recently updated
recently updated
newest added

The shipped wheels on PyPI.org are missing the include files. Drastically reducing the usefulness of the package. CC: @ksivaman @ptrendx

cudnn-frontend-1.5.0

Hi all, I am building the graph as in the image: ![Image](https://github.com/user-attachments/assets/8f0c54f6-8e8e-4d2b-ab3f-2975acc095e1) The [document](https://docs.nvidia.com/deeplearning/cudnn/latest/developer/graph-api.html#single-operation) suggests this graph is supported, but I got the seg fault at `get_uid` when doing `graph->execute`,...

Hi I tried running a sdpa_fp8 graph where seqlen_q and seqlen_k are different, however it seems that it only uses the seqlen_q as in performance is the same when I...

**Describe the bug** fp8 e4m3 wgrad seems to be extremely slow compared to both FP32 and FP16, often 50x to 100x slower. I have attached the profiling results in [this...

**Describe the bug** Consider a graph with more than MAX_OPGRAPH_OPS nodes, for example in this code ```cpp #include "cudnn-frontend/include/cudnn_frontend.h" namespace fe = cudnn_frontend; int main() { cudnnHandle_t handle; assert(cudnnCreate(&handle) ==...

**Describe the bug** I run the following code ```cpp #include "cudnn-frontend/include/cudnn_frontend.h" namespace fe = cudnn_frontend; int main() { cudnnHandle_t handle; assert(cudnnCreate(&handle) == CUDNN_STATUS_SUCCESS); std::vector A = {1.0, 2.0, 3.0, 4.0};...

I keep having issues when compiling apps that requires CUDA and C++ tools on windows I would like to learn best version for CUDA 11.8 and CUDA 12.4 There are...

Hello all, I'm currently working with convolutional layers in `cudnn - python`, and I have a couple of questions regarding the convolution algorithm selection and the setting of group numbers....

I've noticed when using Pytorch's custom autograd functions, that sometimes the stride of `dO` can be `(0, 0, 0, 0)`. Here's a very simple example: https://discuss.pytorch.org/t/getting-unusual-strides-when-using-pytorchs-autograd/208093. In my custom wrapper...

_Downstream PyTorch issue:_ https://github.com/pytorch/pytorch/issues/133780 **Describe the bug** cuDNN frontend rejects batch_size=0 input with `CUDNN_STATUS_BAD_PARAM` **Expected behavior** cuDNN should return to me a tensor [0, num_head, sequence_length, dims_per_head] something like that,...