TensorFlow.NET
TensorFlow.NET copied to clipboard
Memory leak on Linux
Session close
calling not released resources and tf threads are not stopped.
Example project demonstrated threads count increase: https://github.com/deadman2000/TFNetMemoryLeak
It's not TF bug, i tested on similar project on C: https://github.com/deadman2000/TFCThreadTest
It's Linux only problem. On Windows resources released success
I suggest root of problem in using ThreadLocal. Graph never disposed
@deadman2000 You have to dispose Graph explictly.
Manually disposing Graph not helps
session.graph.Dispose();
session.close();
Test log:
2019-10-10 12:42:43.235019: I tensorflow/cc/saved_model/reader.cc:31] Reading SavedModel from: model/
2019-10-10 12:42:43.238466: I tensorflow/cc/saved_model/reader.cc:54] Reading meta graph with tags { serve }
2019-10-10 12:42:43.249303: I tensorflow/cc/saved_model/loader.cc:202] Restoring SavedModel bundle.
2019-10-10 12:42:43.276624: I tensorflow/cc/saved_model/loader.cc:151] Running initialization op on SavedModel bundle at path: model/
2019-10-10 12:42:43.284454: I tensorflow/cc/saved_model/loader.cc:311] SavedModel load for tags { serve }; Status: success. Took 49441 microseconds.
Dispose tf.Tensor '<unnamed Operation>' shape=(1,2) dtype=TF_FLOAT
Dispose Tensor disposing:True _disposed:False
TF_DeleteTensor
Dispose grap-key-45/, (39260016)
Dispose Graph disposing:True _disposed:False
TF_DeleteGraph
Dispose Tensorflow.Session
Dispose Session disposing:True _disposed:False
DisposeUnmanagedResources
TF_DeleteSession
Dispose Tensorflow.Status
Dispose Status disposing:True _disposed:False
TF_DeleteStatus
End
Threads: 71
Press Q to break or any another to repeat
2019-10-10 12:42:43.928763: I tensorflow/cc/saved_model/reader.cc:31] Reading SavedModel from: model/
2019-10-10 12:42:43.932001: I tensorflow/cc/saved_model/reader.cc:54] Reading meta graph with tags { serve }
2019-10-10 12:42:43.941843: I tensorflow/cc/saved_model/loader.cc:202] Restoring SavedModel bundle.
2019-10-10 12:42:43.969191: I tensorflow/cc/saved_model/loader.cc:151] Running initialization op on SavedModel bundle at path: model/
2019-10-10 12:42:43.973776: I tensorflow/cc/saved_model/loader.cc:311] SavedModel load for tags { serve }; Status: success. Took 45020 microseconds.
Dispose tf.Tensor '<unnamed Operation>' shape=(1,2) dtype=TF_FLOAT
Dispose Tensor disposing:True _disposed:False
TF_DeleteTensor
Dispose grap-key-49/, (38424432)
Dispose Graph disposing:True _disposed:False
TF_DeleteGraph
Dispose Tensorflow.Session
Dispose Session disposing:True _disposed:False
DisposeUnmanagedResources
TF_DeleteSession
Dispose Tensorflow.Status
Dispose Status disposing:True _disposed:False
TF_DeleteStatus
End
Threads: 74
Press Q to break or any another to repeat
This might also leak in Linux for an entirely different reason but for me on Windows; Every time you run program (press any key in the console program in the repository) there is memory build up.
Only the following line causes leaking (with or without .as_default()
.
var session = Session.LoadFromSavedModel(modelLocation);`
Therefore the leak is within loading a saved model and disposing it later on. @Oceania2018 To your attention.
I have the same problem with Linux
I'm having a similar problem, I've got about 20 unit tests and memory isn't being completely returned after each one. I'm calling: _session.graph.Dispose(); _session.Dispose();
I'm seeing slight growth after each one that loads a saved model
I am also using Session.LoadFromSavedModel(modelLocation);
@gosha20777 @LukeBolly Could you PR a minimul runnable code into https://github.com/SciSharp/TensorFlow.NET/tree/master/src/TensorFlowNet.Benchmarks/Leak?
@Oceania2018 I haven't had time to create a repro for you, but I've been debugging another issue and have come across some stuff leftover from TensorFlow.NET in a memory dump after my TF process has finished and calling dispose on the graph and session
Confirming the objects are still in memory,
NumSharp
TensorFlow.NET
Could this be the UnmanagedMemoryBlocks not being released?
@LukeBolly Could you run it in latest release? We've removed NumSharp
dependency.
Hi @Oceania2018, I've updated to the latest version. After running all of my unit tests and checking the managed memory at the end, there are still a large number of Tensorflow objects left in memory.
I've gone through my code and added Dispose() to all graphs, sessions, NDArrays, but I still end up with this locked up:
While I can't share the model I'm using, loading and disposing it in a loop confirms there is an issue cleaning up resources when working with a SavedModel.
public class TestModel
{
public TestModel(string classifierModelPath )
{
for (var i = 0; i < 1000; i++)
{
var _classifierSession = Session.LoadFromSavedModel(classifierModelPath );
_classifierSession.graph.Exit();
_classifierSession.graph.Dispose();
_classifierSession.Dispose();
}
}
}
@LukeBolly Does it apply to any model? whether the other models cause the same issue?
Another way might be able to release the resource is tf.Context.reset_context()
.
Yep, I've run it again with a very simple model which is just a bunch of Conv layers and a call signature, and I'm seeing the same behavior. Adding tf.Context.reset_context()
did not resolve the issue.
Here are all the operations in the model
This model is small and initialises much faster so the chart is smoother, but its the same behavior. It's looped about 500 times here:
I can probably get a repro up tomorrow for you if you need?
@LukeBolly It will very helpful if you can create a runnable project to reproduce this issue.
I'll try to get a repro up for you tomorrow if I get time.
@Oceania2018 Unfortunately the fix has broken LoadFromSavedModel entirely, see here:
The graph is disposed as soon as it loads, as a result it isn't usable in any way.
@LukeBolly Sorry for that, I disposed graph
accidently, will fix it in the future release.
@Oceania2018 I've put up a PR that fixes the issue with the graph and extended the test to ensure it runs, it seems like there is still a small leak somewhere though: #858