cudnn-frontend icon indicating copy to clipboard operation
cudnn-frontend copied to clipboard

[Question] How to perform backpropagation for a conv + sigmoid layer?

Open zhewenhu opened this issue 1 year ago • 3 comments

Hi,

I have implemented the forward pass using a convolution + sigmoid_fwd activation and am now working on the backpropagation of the graph. However, according to the document, a graph of sigmoid_bwd + dgrad/wgrad is not supported. I also tried to build this graph but got the error: No valid engine configs for SIGMOID_BWD_ConvBwdData_. Does cuDNN offer any alternatives or methods for implementing this backpropagation?

Here is my code for fprop:

graph_fwd = std::make_shared<fe::graph::Graph>();
graph_fwd->set_io_data_type(fe::DataType_t::FLOAT)
    .set_intermediate_data_type(fe::DataType_t::FLOAT)
    .set_compute_data_type(fe::DataType_t::FLOAT);

X = graph_fwd->tensor(fe::graph::Tensor_attributes()
                    .set_name("input")
                    .set_dim({n, c, h, w})
                    .set_stride({c * h * w, 1, c * w, c}));

W = graph_fwd->tensor(fe::graph::Tensor_attributes()
                    .set_name("weight")
                    .set_dim({k, c, r, s})
                    .set_stride({c * r * s, 1, c * s, c}));

auto conv_options =
    fe::graph::Conv_fprop_attributes().set_padding({0, 0}).set_stride({1, 1}).set_dilation({1, 1});
conv_output = graph_fwd->conv_fprop(X, W, conv_options);

auto sigmoid_options = fe::graph::Pointwise_attributes().set_mode(fe::PointwiseMode_t::SIGMOID_FWD);
Y = graph_fwd->pointwise(conv_output, sigmoid_options);

conv_output->set_output(true);
Y->set_output(true);

And the code for dgrad I attempted but got error No valid engine configs for SIGMOID_BWD_ConvBwdData_:

graph_d_bwd = std::make_shared<fe::graph::Graph>();
graph_d_bwd->set_io_data_type(fe::DataType_t::FLOAT)
    .set_intermediate_data_type(fe::DataType_t::FLOAT)
    .set_compute_data_type(fe::DataType_t::FLOAT);

dY = graph_d_bwd->tensor(fe::graph::Tensor_attributes()
                        .set_name("grad")
                        .set_dim({n, k, h, w})
                        .set_stride({k * h * w, 1, k * w, k}));

W_bwd = graph_d_bwd->tensor(fe::graph::Tensor_attributes()
                        .set_name("weight")
                        .set_dim(W->get_dim())
                        .set_stride(W->get_stride()));

conv_output_bwd = graph_d_bwd->tensor(fe::graph::Tensor_attributes()
                        .set_name("conv_output")
                        .set_dim(conv_output->get_dim())
                        .set_stride(conv_output->get_stride()));

auto dsigmoid_options = fe::graph::Pointwise_attributes().set_mode(fe::PointwiseMode_t::SIGMOID_BWD);
auto dsigmoid_output = graph_d_bwd->pointwise(dY, conv_output_bwd, dsigmoid_options);
dsigmoid_output->set_dim({n, k, h, w});

auto dgrad_options = fe::graph::Conv_dgrad_attributes().set_padding({0, 0}).set_stride({1, 1}).set_dilation({1, 1});
dX = graph_d_bwd->conv_dgrad(dsigmoid_output, W_bwd, dgrad_options);
dX->set_dim({n, c, h, w}).set_output(true);

zhewenhu avatar Sep 24 '24 01:09 zhewenhu

Hi @zhewenhu ,

Thanks for posting this. Unfortunately, cudnn does not support the backward graph pattern.

Instead, the suggestion is to split it into two graphs. One that does dSigmoid and other that does dgrad.

Let us know if you have specific use case in mind.

Thanks

Anerudhan avatar Sep 24 '24 05:09 Anerudhan

Hi @Anerudhan ,

I also tried splitting them, but Sigmoid alone is also not supported, and I got the same error: No valid engine configs for SIGMOID_BWD_. Could you check if I did something wrong?

Here is the code:

graph_d_bwd = std::make_shared<fe::graph::Graph>();
graph_d_bwd->set_io_data_type(fe::DataType_t::FLOAT)
    .set_intermediate_data_type(fe::DataType_t::FLOAT)
    .set_compute_data_type(fe::DataType_t::FLOAT);

dY = graph_d_bwd->tensor(fe::graph::Tensor_attributes()
                        .set_name("grad")
                        .set_dim({n, k, h, w})
                        .set_stride({k * h * w, 1, k * w, k}));

conv_output_bwd = graph_d_bwd->tensor(fe::graph::Tensor_attributes()
                        .set_name("conv_output")
                        .set_dim(conv_output->get_dim())
                        .set_stride(conv_output->get_stride()));

auto dsigmoid_options = fe::graph::Pointwise_attributes().set_mode(fe::PointwiseMode_t::SIGMOID_BWD);
auto dsigmoid_output = graph_d_bwd->pointwise(dY, conv_output_bwd, dsigmoid_options);
dsigmoid_output->set_dim({n, k, h, w}).set_output(true);

zhewenhu avatar Sep 24 '24 13:09 zhewenhu

Hi @zhewenhu ,

I just took a look at this on H100. And this code seems to be passing. Do you know which GPU you are running this on?

Thanks

Anerudhan avatar Oct 10 '24 21:10 Anerudhan