had icon indicating copy to clipboard operation
had copied to clipboard

Gradient always equals zero when using EIGEN

Open tstumm opened this issue 6 years ago • 1 comments

Hi,

I wanted to use your package to do some basic neural network stuff in C++. Since I wrapped the weights of the network in matrices using the Eigen library, I guess I somehow messed with the computational graph. When trying to compute the gradient for each weight, the GetAdjoint method just returns zero. I'm not quite sure why it does this (most probably Eigen does not return the original object but a copy). Maybe you've come across this before and hopefully can help me.

Here's my "MWE":

#include <iostream>
#include "had.h"
#include "Eigen/Eigen"

using namespace had;
DECLARE_ADGRAPH();

typedef had::AReal Number;
typedef Eigen::Matrix<Number, Eigen::Dynamic, Eigen::Dynamic > Mat;

Number sigmoid(Number x) {
    return 1.0 / (1.0 + had::exp(-x));
}

Number square_nmbr(Number x) {
    return had::square(x);
}

Mat forward(std::vector<Mat>& weights, Mat input) {
    Mat ret = input;

    for (int i = 0; i < weights.size(); ++i) {
        ret.conservativeResize(ret.rows() + 1, ret.cols());
        ret.row(ret.rows() - 1).setOnes();
        ret = (weights[i] * ret).unaryExpr(&sigmoid);
    }

    return ret;
}

Number error(Mat input, Mat output) {
    return (input-output).unaryExpr(&square_nmbr).sum();
}

int main() {
    ADGraph adGraph;
    // Stores the weights of the nodes, including bias nodes
    std::vector<Mat> weights;

    // Network layout
    std::vector<unsigned int> layer_sizes {8, 3, 8};

    // Build weight matrices
    for (int i = 0; i < layer_sizes.size() - 1; ++i) {
        Mat rnd = Mat::Random(layer_sizes[i+1], layer_sizes[i] + 1);
        rnd = rnd / 10; // Start with lower values
        weights.push_back(rnd);
    }

    // Build input data
    Mat input = Mat::Identity(8, 8);

    // Forward pass
    Mat result = forward(weights, input);

    // Calculate SSE
    Number err = error(input, result);

    // AD
    SetAdjoint(err, 1.0);
    PropagateAdjoint();
    for (int i = 0; i < weights[0].rows(); ++i) {
        for (int j = 0; j < weights[0].cols(); ++j) {
            std::cout << GetAdjoint(weights[0](i, j)) << " ";
        }
        std::cout << std::endl;
    }
    return 0;
}

tstumm avatar Dec 06 '17 14:12 tstumm