Memory leak due to Graph and Op not being reclaimed
Hi Anthony,
The following code leaks Graph and Op objects in my environment:
def test(): Unit = {
val graph = Graph()
val session: Session = Session(graph)
try {
for (_ <- 0 until 1024 * 10) tf.createWith(graph)(tf.placeholder[Float](Shape(-1, 2, 2)))
} finally {
session.close()
graph.close()
}
}
If I call the test method repeatedly, objects will not be garbage collected. When looking into VisualVM it shows the GC root going through org.platanios.tensorflow.api.utilities.Disposer.records
Here is an example that should compile: https://gist.github.com/huynhjl/00a9ee6958f1b0143b701eb7b2563005
Let me know if I'm doing anything wrong.
As far as I can tell by looking at a memory dump, it looks like Disposer.records indirectly holds a reference to the graph preventing it from being garbage collected. This is because Session.apply adds a closeFn function to graph.nativeHandleWrapper.preCleanupFunctions for the graph to clean up the session and close the graph.reference but that in itself prevents garbage collection.
@eaplatanios I think this issue is the most relevant when considering large scale experimentation/training and hyper-parameter tuning using TF_Scala.
Currently implementations such as TunableTFModel in the DynaML API rely on graph.close() to free up resources.
Let us know if there is any way I, @huynhjl or others can help in resolving this. Although my understanding of the codebase is still a bit high level.
cc @sbrunk @lucataglia @DirkToewe
I'm going to look into it.
I'm sorry I've been off TF Scala for a while, working on other projects. @mandar2812 @DirkToewe @sbrunk if you're interested, we could have a conference call at some point to help you understand the codebase at a deeper level. Just let me know and we can plan it.
@eaplatanios I would love that! Maybe we should make a doodle and fix a time thats okay for all of us? What do you think @sbrunk @DirkToewe ?
A tour of the project would be greatly appreciated! I just need like two days to take a look at the code again (It's been a while) so I can ask better questions. There is an unofficial Tensorflow(JS) Discord server that we could use to coordinate and talk.
I've been a bit disconnected from TF Scala since I left academia but I'd still be interested in joining a call about the codebase. I'm also super interested in what you think about Swift for TF since I've seen you've worked with it too :)
Sounds good to me! And yes, I've been working on Swift for TF for quite some time now and would also be happy to talk about that. :) Does someone want to coordinate this? A doodle poll may be a good start. I'm sorry but I've been super busy lately.
@eaplatanios @sbrunk @DirkToewe Ill set up a doodle poll this weekend.
@mandar2812 just a gentle ping about the poll. We can also schedule it informally here. My schedule is quite flexible over the next week.
@eaplatanios sorry for this huge delay in setting up the doodle :D. Im finishing my thesis next week so I would prefer sometime in the last 10 days of August. Is that okay for you guys?
Fine with me.