unison icon indicating copy to clipboard operation
unison copied to clipboard

Memory leak on server when using -repeat with fswatcher

Open bruceg opened this issue 4 years ago • 9 comments

I am using Unison 2.48.4 on Ubuntu on my laptop with the command line unison -ui text -repeat 60 default, and on Gentoo on the server. On the laptop, this runs normally for days with no apparent growth in process size. On the server, however, the process size grows over time. For this account, it starts around 200MB RSS after the initial sync, but it was up to 14GB before I killed it to free up RAM.

Using -repeat watch -watch also causes process growth, but it does not appear to be as fast a growth. This is on a different account, so it would have different usage, but AFAICT it should have more changes happening, not less.

ps reports that unison started unison-fsmonitor on the server, despite it not being requested (or present) on the client. I don't know if this is relevant.

bruceg avatar Mar 27 '20 14:03 bruceg

Please update to 2.51.2 and retest: 2.48 is not supported. I'll leave this open anyway because until someone tests with 2.51, the null hypothesis that the bug has not been fixed seems credible.)

gdt avatar Sep 14 '20 12:09 gdt

I upgraded my desktop (a different system) to 2.51.2 (should have done it a while ago anyways). I ran it with -repeat 1 -watch and confirmed that the process growth still happens. In under 15 minutes it grew from 250MB RAM to 2GB RAM, and was continuing to grow.

bruceg avatar Sep 14 '20 15:09 bruceg

Thanks for testing with 2.51.2; I dropped the feedback label. It now remains for someone to find the leak....

gdt avatar Sep 14 '20 16:09 gdt

Since I am running on Gentoo with a custom ebuild, I can easily add any patches to help with testing. Unfortunately, I know very little OCaml, so I can't help much directly.

bruceg avatar Sep 14 '20 17:09 bruceg

#33 seems to be the same issue.

While investigating some other issues (for example #352), I've found that Unison remoting code does appear to be consuming memory without releasing it. But when testing with -repeat, I have not been able to reproduce infinite growth in memory allocation as the intially allocated memory gets reused.

Of course, I can't replicate the exact scenarios of the issues reported here, but with my own rather extensive testing so far, I am beginning to believe that the cuplrit may be the fsmonitor watcher (see also #284). (I'm testing on a platform which does not compile fsmonitor at all, so my environment is completely free of any fsmonitor and watcher side-effects.)

@bruceg can you test again making sure that no fsmonitor processes are running, don't use -repeat watch and do use watch = false. If you have to, remove the fsmonitor executable completely. Another idea to try: if you could synchronize the same root locally on your machine, do you see same kind of memory usage?

tleedjarv avatar Oct 11 '20 13:10 tleedjarv

Status update. A test has been running for about 8 hours now with -repeat 60. During this time there have been millions of file additions/removals and tens of thousands file content updates over tens of GB. The total root size has been constant at 1.5 million files (this may be relevant in context of the below).

Memory consumption has been completely stable. After reaching the maximum allocation level (which is pretty much after the first synchronization with one root empty), the RSS remains constant. There is absolutely no increase in allocated memory.

So, memory is not being eaten away, but it's also not being released. I don't know why memory is not being released. Perhaps it is something in Unison code, intentional or unintentional, perhaps it has something to do with OCaml GC.

Edit: After the first run, I tried to make the changes more varying so that the replicas would really change drastically over time. Outcome remains exactly the same. Maximum memory allocation is reached after completing synchronization of the maximum root size (whenever in the timeline that happens to be) and after that remains constant. This no matter how much changes there are in the replicas, including changes of several orders of magnitude in the number of files.

tleedjarv avatar Oct 11 '20 17:10 tleedjarv

I started running unison -ui text -repeat 1 (1 second to make it exhibit the problem quicker) with watch = false in my config. Running ps $(pgrep unison) confirms no unison-fsmonitor process is running. In this configuration, the memory usage did indeed plateau and hold fairly steady after a few passes. Removing watch = false from the config file starts the unison-fsmonitor process, and this combination rapidly balloons far beyond the memory usage with no fsmonitor process, even though the fsmonitor isn't actually used to track changes (?)

Incidentally, including watch = false in the config but using -repeat watch results in an error after the first synchronization completes: Fatal error: No file monitoring helper program found

So I still think this is a bug, but its nature is certainly different than what was apparent when I initially reported it.

bruceg avatar Oct 14 '20 20:10 bruceg

I declare this bug to be about the fswatcher leak only.

gdt avatar Oct 15 '20 11:10 gdt

Could you test again with the latest version?

tleedjarv avatar Jul 27 '22 09:07 tleedjarv

Feedback timeout. Feel free to file a new issue with a repro recipe with 2.53.1 or newer.

gdt avatar Mar 19 '23 14:03 gdt