neon icon indicating copy to clipboard operation
neon copied to clipboard

Major memory leak on somewhat advanced models using cpu backend

Open miketout opened this issue 8 years ago • 3 comments

Hi, I've actually forked and have been using Neon for some time. I have built some fairly large systems with it, but until recently updating my source tree to 2.3+, I hadn't run into these issues.

I previously updated some capabilities in 1.8, such as being able to nest various trees, merge containers, and more, and have more recently updated to 2.3+, ported some of my previous work to the newer version as well. Right now, I'm building a text classification engine, which is working great, except for two major issues:

  1. MKL backend has a number of bugs and simply doesn't work on my convolutional models. I fixed some bugs recently with MergeBroadcast (in my fork), but that backend still doesn't work on my models.
  2. Using the CPU backend, there is a major memory leak on my convolutional models, which seems to be in xprop_conv and forces me to reboot our server to eliminate it. I even tried regenerating a backen every time, but nothing I do besides closing the process seems to address the leak. While in the fit operation, even if I comment out back propagation, I'm losing so much memory with every minibatch that we will have to get a fix or switch from using Neon for the project.

I'd be happy to provide more information if you contact me directly. I can provide a repro privately, as the source code I'm working on has not yet been made open, although we expect to release it open source as a real world ML project when it is finished.

miketout avatar Dec 04 '17 05:12 miketout

It looks like the issue is related to padding. I have removed some areas on our model where there was padding that we didn't need, and the leak is gone. Right now, I'd consider this a workaround. If you're interested, or if I can in the near future, I will try to put together a repro case.

miketout avatar Dec 05 '17 04:12 miketout

Thanks @miketout for digging into the above issue and coming up a workaround. We will try to take a look and try to reproduce after you release the open source ML project.

wei-v-wang avatar Dec 05 '17 06:12 wei-v-wang

Here's a link to the open source project, which was just released by Synacor: https://github.com/Zimbra/zimbra-ml

It requires some modifications I made to Neon in my fork, primarily for allowing broadcast or multistream containers to be fed by tree or sequential graphs from above. I also disable MKL, as I've found that it doesn't run. I'll probably open a separate issue on that.

As far as this memory leak, I could probably reproduce it, but I believe it was due to padding on the height when not intended, which I've since removed completely from the code.

miketout avatar Jan 06 '18 01:01 miketout