TensorFlowSharp icon indicating copy to clipboard operation
TensorFlowSharp copied to clipboard

WIP: gradient support

Open migueldeicaza opened this issue 8 years ago • 14 comments

This requires the C API to get support for it.

There is a bug here: https://github.com/tensorflow/tensorflow/issues/6268

  • [ ] Add port of the test suite from tensorflow/908d5b6ede6ae829dff138a873eec397ef434cd6

migueldeicaza avatar Feb 04 '17 04:02 migueldeicaza

It seems the gradient support is recently addressed :-) https://github.com/tensorflow/tensorflow/issues/6268

JimSEOW avatar Apr 05 '17 21:04 JimSEOW

Not quite :-)

Still waiting on it.

migueldeicaza avatar Apr 15 '17 01:04 migueldeicaza

Added the basic binding, but I have not written the tests, I updated the bug description to track that.

Additionally, the tensorflow commit [0] not all capabilities from the C++ API have been surfaced yet.

[0] https://github.com/tensorflow/tensorflow/commit/908d5b6ede6ae829dff138a873eec397ef434cd6

migueldeicaza avatar Apr 29 '17 02:04 migueldeicaza

Is it now possible to retrieve the gradients and do some form of gradient descent?

mfagerlund avatar Aug 01 '17 03:08 mfagerlund

I would like to ask the same question - using the new pre-release package that has just been uploaded to NuGet (1.3.0-pre1), is it already possible to retrieve gradients for tensors and do any form of gradient descent at this time? I am not totally currently up-to-date with the status of the C API support for this feature yet, so I guessed it would be easier to ask :-)

cesarsouza avatar Aug 27 '17 13:08 cesarsouza

@migueldeicaza Hi, I started to verify AddGradients() API with code like this:

			var x = graph.Const (3.0);
			
			var y = graph.Square (x);
			var y1 = graph.Square (y);

			var y2 = graph.Square (y1);
			var g = graph.AddGradients (new TFOutput [] { y, y2 }, new [] {  x});

			var r = session.Run (new TFOutput [] { }, new TFTensor [] { }, g);
			double dy = (double)r [0].GetValue ();
			double dy2 = (double)r [1].GetValue ();
			Assert.Equal (17502.0, dy + dy2);

and got couple of problems: 1. 'cstatus.Handle' should be passed to TF_AddGradients() but not 'status.Handle' variable. 2. I'm expecting two results 6 (y derivative) and 17496 (y2 derivative) in that run according to documentation: d(y[0] + y[1]+ ...)/dx[0], d(y[0] + y[1] + ...)/dx[1]

but the API returns only one result 6. 3. If the number of inputs > 1 the API fails with Access to unprotected memory, for instance var g = graph.AddGradients (new TFOutput [] { y, y2 }, new [] { x, x2});

1 is quite simple to fix what do you think about 2 and 3 ? Is it binding issue or underlying native API? Thanks.

Dorokhov avatar Sep 04 '17 06:09 Dorokhov

Hi @Dorokhov,

If I understood correctly TensorFlow's documentation for AddGradients, the return vector should have the same length as the inputs vector, therefore the answer would have indeed just one value (cf. the doc: "The partial derivatives are returned in dy. dy should be allocated to size nx." where I understand that in this case, nx would have been the length of new [] { x } which would therefore be 1).

But maybe I am wrong, I am just starting with the gradients API. Let me see if I can help find where the problem is.

Regards, Cesar

cesarsouza avatar Sep 09 '17 12:09 cesarsouza

Hi @cesarsouza,

I think you are right, and the API should produce one value - partial derivatives of 'y' sums which are 17502 (6+17496) in my case, but it returns 6. I will try to test the same case in native API.

Thanks.

Dorokhov avatar Sep 11 '17 12:09 Dorokhov

I'm struggling with getting simple models built in Python "productionized" into C#. This is what I see as a first class use case for this library, but I am not seeing a good example for it.

Take a model built, trained, and saved in python. Save off the *.pb file (From what i can gather, that is all i need) In C#: Create Graph, a Session, Input Variables, Input Values, and Outputs to run the model. Run the model, then interrogate the output.

The most challenging part for me is constructing the InputVariables, InputValues, and Output. The construction patterns, and instance usages for those objects are befuddling me to say the least.

I assume I am not seeing a pattern in the examples, or that I am fighting the naming conventions, or... I don't know what I am missing... And I can't be the only person struggling with this.

The python code is attached. It's a simple linear regression model I ripped off from a stanford class and made more portable.

Here are the samples i am trying to create. Happy to donate them to the cause when they are complete. FireTheftLinearRegression.py.txt

FireTheftLinearRegression.cs.txt

sqlBender avatar Sep 15 '17 19:09 sqlBender

Hi @sqlBender,

I have to say I also share your pain, but I am not sure if the (actually very relevant) issue you have raised is connected to the original topic of this current issue, that is, the ability to obtain automatic gradient calculations through the AddGradients method.

If you want to take a look, the Keras Sharp project is aiming at providing an API that is very similar to its Python equivalent, but unfortunately that project is also still a bit blocked until this very issue here gets eventually addressed.

Regards, Cesar

cesarsouza avatar Sep 15 '17 20:09 cesarsouza

@cesarsouza it's just where I found an issue similar. Gradient etc. I'll spin up a new issue since my issue is not really related to this thread.

sqlBender avatar Sep 15 '17 21:09 sqlBender

Started discussing the gradient API issue in the tensorflow repository, hoping we will find how the API actually works soon.

Dorokhov avatar Sep 28 '17 08:09 Dorokhov

Seems like TF doesn't have all gradient operations defined yet. I am referencing here an issue I've just created in TF's issue tracker regarding a missing gradient for tf.select.

cesarsouza avatar Nov 23 '17 21:11 cesarsouza

Hi @migueldeicaza, @cesarsouza, The Linear Regression example can run on another .NET binding library now, check the code here.

Piece of code

            // tf Graph Input
            var X = tf.placeholder(tf.float32);
            var Y = tf.placeholder(tf.float32);

            // Set model weights 
            // We can set a fixed init value in order to debug
            // var rnd1 = rng.randn<float>();
            // var rnd2 = rng.randn<float>();
            var W = tf.Variable(-0.06f, name: "weight");
            var b = tf.Variable(-0.73f, name: "bias");

            // Construct a linear model
            var pred = tf.add(tf.multiply(X, W), b);

            // Mean squared error
            var cost = tf.reduce_sum(tf.pow(pred - Y, 2.0f)) / (2.0f * n_samples);

            // gradient descent
            // Note, minimize() knows to modify W and b because Variable objects are trainable=True by default
            var optimizer = tf.train.GradientDescentOptimizer(learning_rate).minimize(cost);

            // Initialize the variables (i.e. assign their default value)
            var init = tf.global_variables_initializer();

Deep-Blue-2013 avatar Feb 22 '19 06:02 Deep-Blue-2013