cudnn-frontend issues

Missing header files in the package

1

The shipped wheels on PyPI.org are missing the include files. Drastically reducing the usefulness of the package. CC: @ksivaman @ptrendx

DEKHTIARJonathan

cudnn-frontend-1.5.0

[Question] How to add bias after conv?

4

Hi all, I am building the graph as in the image: ![Image](https://github.com/user-attachments/assets/8f0c54f6-8e8e-4d2b-ab3f-2975acc095e1) The [document](https://docs.nvidia.com/deeplearning/cudnn/latest/developer/graph-api.html#single-operation) suggests this graph is supported, but I got the seg fault at `get_uid` when doing `graph->execute`,...

zhewenhu

sdpa_fp8 having different seqlen_q and seqlen_k

Hi I tried running a sdpa_fp8 graph where seqlen_q and seqlen_k are different, however it seems that it only uses the seqlen_q as in performance is the same when I...

MustafaFayez

Extremely slow fp8 conv2d wgrad operation

4

**Describe the bug** fp8 e4m3 wgrad seems to be extremely slow compared to both FP32 and FP16, often 50x to 100x slower. I have attached the profiling results in [this...

jimgao1

cudnn-frontend crashes in case of MAX_OPGRAPH_OPS violation

**Describe the bug** Consider a graph with more than MAX_OPGRAPH_OPS nodes, for example in this code ```cpp #include "cudnn-frontend/include/cudnn_frontend.h" namespace fe = cudnn_frontend; int main() { cudnnHandle_t handle; assert(cudnnCreate(&handle) ==...

gritukan

Wrong result of tensor addition with broadcasting

**Describe the bug** I run the following code ```cpp #include "cudnn-frontend/include/cudnn_frontend.h" namespace fe = cudnn_frontend; int main() { cudnnHandle_t handle; assert(cudnnCreate(&handle) == CUDNN_STATUS_SUCCESS); std::vector A = {1.0, 2.0, 3.0, 4.0};...

gritukan

Which Visual Studio 2022 BuildTools MSVC is the best version for Cuda 11.8 and Cuda 12.4 and so

I keep having issues when compiling apps that requires CUDA and C++ tools on windows I would like to learn best version for CUDA 11.8 and CUDA 12.4 There are...

FurkanGozukara

[Question] How is the convolution algorithm set, and how to specify group number ?

Hello all, I'm currently working with convolutional layers in `cudnn - python`, and I have a couple of questions regarding the convolution algorithm selection and the setting of group numbers....

haoqian-hao

[Question] Making dO contiguous affects output?

2

I've noticed when using Pytorch's custom autograd functions, that sometimes the stride of `dO` can be `(0, 0, 0, 0)`. Here's a very simple example: https://discuss.pytorch.org/t/getting-unusual-strides-when-using-pytorchs-autograd/208093. In my custom wrapper...

vedantroy

Support batch size 0

_Downstream PyTorch issue:_ https://github.com/pytorch/pytorch/issues/133780 **Describe the bug** cuDNN frontend rejects batch_size=0 input with `CUDNN_STATUS_BAD_PARAM` **Expected behavior** cuDNN should return to me a tensor [0, num_head, sequence_length, dims_per_head] something like that,...

Birch-san

cudnn-frontend
cudnn-frontend copied to clipboard

Metadata

Missing header files in the package

[Question] How to add bias after conv?

sdpa_fp8 having different seqlen_q and seqlen_k

Extremely slow fp8 conv2d wgrad operation

cudnn-frontend crashes in case of MAX_OPGRAPH_OPS violation

Wrong result of tensor addition with broadcasting

Which Visual Studio 2022 BuildTools MSVC is the best version for Cuda 11.8 and Cuda 12.4 and so

[Question] How is the convolution algorithm set, and how to specify group number ?

[Question] Making dO contiguous affects output?

Support batch size 0

← Metadata

Owner

Metadata

cudnn-frontend cudnn-frontend copied to clipboard

Metadata

← Metadata

Owner

Metadata

cudnn-frontend
cudnn-frontend copied to clipboard