deepxde
deepxde copied to clipboard
NaN in MfNN example
Hi,
When running the example that reads from a dataset (the function version works fine):
https://github.com/lululxvi/deepxde/blob/master/examples/function/mf_dataset.py
I get a NaN in one of the test loss outputs:
Using backend: tensorflow.compat.v1
2022-03-16 14:44:58.717643: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory
2022-03-16 14:44:58.717690: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.
WARNING:tensorflow:From /home/marc/.local/lib/python3.8/site-packages/tensorflow/python/compat/v2_compat.py:111: disable_resource_variables (from tensorflow.python.ops.variable_scope) is deprecated and will be removed in a future version.
Instructions for updating:
non-resource variables are not supported in the long term
WARNING:tensorflow:From /home/marc/.local/lib/python3.8/site-packages/deepxde/nn/initializers.py:118: The name tf.keras.initializers.he_normal is deprecated. Please use tf.compat.v1.keras.initializers.he_normal instead.
Compiling model...
Building multifidelity neural network...
/home/marc/.local/lib/python3.8/site-packages/deepxde/nn/tensorflow_compat_v1/mfnn.py:114: UserWarning: `tf.layers.dense` is deprecated and will be removed in a future version. Please use `tf.keras.layers.Dense` instead.
return tf.layers.dense(
/home/marc/.local/lib/python3.8/site-packages/keras/legacy_tf_layers/core.py:255: UserWarning: `layer.apply` is deprecated and will be removed in a future version. Please use `layer.__call__` method instead.
return layer.apply(inputs)
'build' took 0.129456 s
2022-03-16 14:45:01.859670: E tensorflow/stream_executor/cuda/cuda_driver.cc:271] failed call to cuInit: UNKNOWN ERROR (100)
2022-03-16 14:45:01.859725: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:156] kernel driver does not appear to be running on this host (MARC-PC): /proc/driver/nvidia/version does not exist
2022-03-16 14:45:01.859979: I tensorflow/core/platform/cpu_feature_guard.cc:151] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
'compile' took 0.447103 s
Initializing variables...
Training model...
Step Train loss Test loss Test metric
0 [4.16e+01, 6.23e+01, 8.44e-01] [nan, 2.13e+01, 8.44e-01] [1.01e+00, 1.03e+00]
Best model at step 0:
train loss: 1.05e+02
test loss: nan
test metric: [1.01e+00, 1.03e+00]
'train' took 0.228125 s
Is there a problem with my env setup?
I am running with:
- master branch
- Python 3.8.10
- tensorflow==2.7.0
Thanks
It is as expected because we don't have test data for low-fidelity.
Then the code runs only for 1 epoch and then it stops. Is that expected for the example?
I see this is set here
https://github.com/lululxvi/deepxde/blob/303ae8067d86b0b38ab06dd5701e51e17f685206/deepxde/model.py#L579-L583
Is there a way to set up the model at initialization so that the run does not stop?
Yes, you are right. Please install the updated version v1.1.2.
If you install DeepXDE>1.1.2, such as 1.1.3, then in order to have exactly the same behavior as before, see https://github.com/lululxvi/deepxde/releases/tag/v1.1.3
Thank you, that works.
Final question:
The figure obtained for the dataset run is:
In comparison, this is the one obtained for the function version:
For the upper curve, some training dots are 0. Is this correct?
You can ignore those points.