TensorFlow.NET icon indicating copy to clipboard operation
TensorFlow.NET copied to clipboard

.Net keras does not converge compared to python keras

Open jjaskulowski opened this issue 2 years ago • 7 comments

This does not converge:

using static Tensorflow.KerasApi;
using static Tensorflow.tensorflow;
using Tensorflow;
using Tensorflow.NumPy;
 
var inputs = np.array(new float[,] { { 0, 0 }, { 0, 1 }, { 1, 0 }, { 1, 1 } });
var outputs = np.array(new float[] { 0, 1, 1, 0 });
var model = keras.Sequential();
model.add(keras.layers.InputLayer(new Shape(2)));
model.add(keras.layers.Dense(4, keras.activations.Tanh));
model.add(keras.layers.Dense(4, keras.activations.Tanh));
model.add(keras.layers.Dense(1, keras.activations.Sigmoid));
model.compile(keras.optimizers.SGD(0.1f), keras.losses.MeanSquaredError(), new [] { "mae" });

model.fit(inputs,  outputs, epochs: 1000);

var pred_outputs = model.predict(inputs);

foreach (var output in pred_outputs)
{
    Console.WriteLine(string.Join(",", output.ToArray<float>()));
}

while this equivalent does:

import tensorflow as tf
import timeit
import numpy as np

device_name = tf.test.gpu_device_name()
if device_name != '/device:GPU:0':
  print(
      '\n\nThis error most likely means that this notebook is not '
      'configured to use a GPU.  Change this in Notebook Settings via the '
      'command palette (cmd/ctrl-shift-P) or the Edit menu.\n\n')
  raise SystemError('GPU device not found')

def gpu():
  with tf.device('/device:GPU:0'):
    
    model = tf.keras.Sequential()
    model.add(tf.keras.Input(shape=2))
    model.add(tf.keras.layers.Dense(4, activation="tanh"))
    model.add(tf.keras.layers.Dense(4, activation="tanh"))
    model.add(tf.keras.layers.Dense(1, activation="sigmoid"))
    model.compile(tf.keras.optimizers.SGD(0.1), loss="mse", metrics=["mae"])
    model.fit([[1,1], [1,0], [0,1], [0,0]], [[0],[1],[1],[0]], epochs= 1000 )
    print(model.predict([[1,1], [1,0], [0,1], [0,0]]))
# We run each op once to warm up; see: https://stackoverflow.com/a/45067900
gpu()

Dependencies:

   <PackageReference Include="NumSharp" Version="0.30.0" />
    <PackageReference Include="SciSharp.TensorFlow.Redist" Version="2.10.0" />
    <PackageReference Include="SciSharp.TensorFlow.Redist-Windows-GPU" Version="2.10.0" />
    <PackageReference Include="TensorFlow.Keras" Version="0.10.0" />
    <PackageReference Include="TensorFlow.NET" Version="0.100.0" />

jjaskulowski avatar Feb 11 '23 20:02 jjaskulowski

I run the codes and get loss=0.000765 after 1000th epoch in C# and loss=0.0122 in python.

It seems that it converge in C# but does not in python. Is that the same with your device?

Whatever, it seems that something of tf.net.keras does not align with tf.keras.

AsakusaRinne avatar Feb 13 '23 10:02 AsakusaRinne

@AsakusaRinne In my local environment it does not converge at all:

2023-02-13 23:04:10.069351: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. Epoch: 001/1000 0001/0001 [==============================>] - 231ms/step loss: 0.247893, mean_absolute_error: 0.494114 Epoch: 002/1000 0001/0001 [==============================>] - 5ms/step loss: 0.247893, mean_absolute_error: 0.494114 Epoch: 003/1000 0001/0001 [==============================>] - 3ms/step loss: 0.247893, mean_absolute_error: 0.494114 Epoch: 004/1000 0001/0001 [==============================>] - 4ms/step loss: 0.247893, mean_absolute_error: 0.494114 Epoch: 005/1000 0001/0001 [==============================>] - 7ms/step loss: 0.247893, mean_absolute_error: 0.494114 Epoch: 006/1000 0001/0001 [==============================>] - 4ms/step loss: 0.247893, mean_absolute_error: 0.494114 Epoch: 007/1000 0001/0001 [==============================>] - 5ms/step loss: 0.247893, mean_absolute_error: 0.494114 Epoch: 008/1000 0001/0001 [==============================>] - 5ms/step loss: 0.247893, mean_absolute_error: 0.494114 Epoch: 009/1000 0001/0001 [==============================>] - 5ms/step loss: 0.247893, mean_absolute_error: 0.494114 Epoch: 010/1000 0001/0001 [==============================>] - 5ms/step loss: 0.247893, mean_absolute_error: 0.494114 Epoch: 011/1000 0001/0001 [==============================>] - 4ms/step loss: 0.247893, mean_absolute_error: 0.494114 Epoch: 012/1000 0001/0001 [==============================>] - 3ms/step loss: 0.247893, mean_absolute_error: 0.494114 Epoch: 013/1000 0001/0001 [==============================>] - 3ms/step loss: 0.247893, mean_absolute_error: 0.494114 Epoch: 014/1000 0001/0001 [==============================>] - 3ms/step loss: 0.247893, mean_absolute_error: 0.494114 Epoch: 015/1000 0001/0001 [==============================>] - 5ms/step loss: 0.247893, mean_absolute_error: 0.494114 .... Later predicted outputs are as follows: 0.5, 0.43938702, 0.47922894, 0.39507216

jjaskulowski avatar Feb 13 '23 22:02 jjaskulowski

@jjaskulowski Could you please provide the version of tf.net and tf.net.keras you used and the device information?

AsakusaRinne avatar Feb 14 '23 02:02 AsakusaRinne

@AsakusaRinne

The app was a simple new c# core command line app created in vs 2022 pro.

    <PackageReference Include="NumSharp" Version="0.30.0" />
    <PackageReference Include="SciSharp.TensorFlow.Redist" Version="2.10.0" />
    <PackageReference Include="SciSharp.TensorFlow.Redist-Windows-GPU" Version="2.10.0" />
    <PackageReference Include="TensorFlow.Keras" Version="0.10.0" />
    <PackageReference Include="TensorFlow.NET" Version="0.100.0" />

OS Name:                   Microsoft Windows 10 Pro
OS Version:                10.0.19044 N/A Build 19044
OS Manufacturer:           Microsoft Corporation
OS Configuration:          Standalone Workstation
OS Build Type:             Multiprocessor Free
Registered Owner:          N/A
Registered Organization:   N/A
Product ID:                00342-50478-94012-AAOEM
Original Install Date:     11/05/2022, 11:29:07
System Boot Time:          02/02/2023, 20:37:18
System Manufacturer:       Dell Inc.
System Model:              Latitude E7470
System Type:               x64-based PC
Processor(s):              1 Processor(s) Installed.
                           [01]: Intel64 Family 6 Model 78 Stepping 3 GenuineIntel ~2396 Mhz
BIOS Version:              Dell Inc. 1.3.0, 14/02/2016
Windows Directory:         C:\WINDOWS
System Directory:          C:\WINDOWS\system32
Boot Device:               \Device\HarddiskVolume2
System Locale:             en-gb;English (United Kingdom)
Input Locale:              en-gb;English (United Kingdom)
Time Zone:                 (UTC+01:00) Sarajevo, Skopje, Warsaw, Zagreb
Total Physical Memory:     16,267 MB
Available Physical Memory: 4,978 MB
Virtual Memory: Max Size:  21,007 MB
Virtual Memory: Available: 2,525 MB
Virtual Memory: In Use:    18,482 MB
Page File Location(s):     C:\pagefile.sys
Domain:                    WORKGROUP
Logon Server:              \\DESKTOP-CEEUBV8
Hotfix(s):                 13 Hotfix(s) Installed.
                           [01]: KB5020872
                           [02]: KB5003791
                           [03]: KB5012170
                           [04]: KB5022282
                           [05]: KB5007273
                           [06]: KB5014032
                           [07]: KB5014035
                           [08]: KB5014671
                           [09]: KB5015895
                           [10]: KB5016705
                           [11]: KB5018506
                           [12]: KB5020372
                           [13]: KB5003242

jjaskulowski avatar Feb 14 '23 09:02 jjaskulowski

It's quite confusing that I cannot reproduce it...Everything just goes well in my local environment with the dependencies above. Could you please try the following steps to help us locate the error?

  1. Just rebuild your project and run again.
  2. Remove the package SciSharp.TensorFlow.Redist-Windows-GPU and try again.

BTW, Could you please tell us your CUDA version and dotnet version? @jjaskulowski

AsakusaRinne avatar Feb 15 '23 13:02 AsakusaRinne

@jjaskulowski Hey, have you solved it? I'm quite interested about it. :)

AsakusaRinne avatar Feb 27 '23 19:02 AsakusaRinne

Nope but I'm not sure where to find the data I've been asked for for the issue. I Aldo am computing on cpu. Not sure how is that related to cuda or dnn.

pon., 27 lut 2023, 20:08 użytkownik Yaohui Liu @.***> napisał:

@jjaskulowski https://github.com/jjaskulowski Hey, have you solved it? I'm quite interested about it. :)

— Reply to this email directly, view it on GitHub https://github.com/SciSharp/TensorFlow.NET/issues/983#issuecomment-1446901138, or unsubscribe https://github.com/notifications/unsubscribe-auth/AEZV62DL2TXAVLXBV7KEQETWZT3UPANCNFSM6AAAAAAUY4CBX4 . You are receiving this because you were mentioned.Message ID: @.***>

jjaskulowski avatar Feb 28 '23 15:02 jjaskulowski