zellij icon indicating copy to clipboard operation
zellij copied to clipboard

Memory leak

Open cosminadrianpopescu opened this issue 2 years ago • 14 comments

Thank you for taking the time to file this issue! Please follow the instructions and fill in the missing parts below the instructions, if it is meaningful. Try to be brief and concise.

In Case of Graphical or Performance Issues

  1. Delete the contents of /tmp/zellij-1000/zellij-log, ie with cd /tmp/zellij-1000/ and rm -fr zellij-log/
  2. Run zellij --debug
  3. Recreate your issue.
  4. Quit Zellij immediately with ctrl-q (your bug should ideally still be visible on screen)

Please attach the files that were created in /tmp/zellij-1000/zellij-log/ to the extent you are comfortable with.

Basic information

zellij --version: zellij 0.30.0 stty size: 46 197 uname -av or ver(Windows): Linux ip-### Wed Dec 16 22:44:04 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux

List of programs you interact with as, PROGRAM --version: output cropped meaningful, for example: nvim --version: NVIM v0.5.0-dev+1299-g1c2e504d5 (used the appimage release) alacritty --version: alacritty 0.7.2 (5ac8060b)

Further information Reproduction steps, noticeable behavior, related issues, etc

I have a long running zellij instance to which I connect and disconnect often (more than once per day). After weeks of using the same session, I notice that more than 2GB or RAM are being used by the zellij server. See the attached screen shot.

cosminadrianpopescu avatar Aug 01 '22 07:08 cosminadrianpopescu

bug-zellij

cosminadrianpopescu avatar Aug 01 '22 07:08 cosminadrianpopescu

Unfortunatelly I don't have the logs from the attached screen shot session. If this is a blocker in investigating this issue, then next time I will attach another screen shot and will copy the logs also before.

cosminadrianpopescu avatar Aug 01 '22 08:08 cosminadrianpopescu

I have a similar set up and was surprised to see zellij take 900Mb, I assume is not expected?

olanod avatar Aug 30 '22 09:08 olanod

Thank you for reporting this!

Sorry we took so long to reply. I tried to reproduce this with the latest zellij version locally and I can confirm this behavior. That clearly shouldn't happen!

I have a suspicion where this may originate. I'll investigate and get back to you once I found something.

har7an avatar Sep 26 '22 11:09 har7an

So this turns out to be more subtle than I had imagined. But I have another question: Do you have a scrollback limit set? Or is your scrollback infinite?

har7an avatar Oct 13 '22 08:10 har7an

No, I don't have by default any limit set. Should I set up a limit?

cosminadrianpopescu avatar Oct 24 '22 07:10 cosminadrianpopescu

I have been encountering this issue as well (I have seen Zellij take up > 1 GB of memory).

I have attempted to reproduce this synthetically by:

  1. opening a new tab
  2. running yes in the new tab for a few seconds
  3. opening a new tab and closing the current tab
  4. observing that the memory usage at step 4 is higher than at step 1.

Note: this may be distinct from the > 1 GB possible memory leak. It may also be the same.

InnovativeInventor avatar Nov 20 '22 17:11 InnovativeInventor

Update: It appears that the memory leak may possibly originate from here: https://github.com/zellij-org/zellij/blob/5ad0429adc3bbb33b2578d3e699f2e61d6d0b940/zellij-client/src/stdin_handler.rs#L14. In particular, when I move this line into the loop block, the memory leak appears to go away. More investigation is needed, though, to figure out a fix and confirm the cause of the memory leak.

I'm not quite sure what zellij-client is supposed to be doing (or that event loop). Is there some documentation that provides a high-level overview of how zellij is structured?

InnovativeInventor avatar Nov 20 '22 21:11 InnovativeInventor

I can confirm that the memory leaks are related to the scrollback. The memory is not (fully) released when the pane or tab is closed. Test case:

  • open zellij
  • yes $(python -c "print('x' * 2000)") (in bash)
  • open new pane / tab
  • close first tab with the python command
  • memory consumption stays high

See also #2104

raphCode avatar Jan 19 '23 18:01 raphCode

I can confirm that the memory leaks are related to the scrollback. The memory is not (fully) released when the pane or tab is closed.

I did a bit of testing on this, and I'm starting to doubt that this is really a memory leak. I basically repeated the test above 10 times in the same zellij session and saw memory usage stay pretty much constant. If the memory was actually leaked I would have expected usage to go up 10x.

Some searching around supports the idea (given that by default on linux rust uses malloc): https://stackoverflow.com/questions/45538993/why-dont-memory-allocators-actively-return-freed-memory-to-the-os/45539066#45539066.

Repeating the same kinds of tests after having switched to jemalloc, gives different results and I observed memory usage drop significantly after closing a tab.

tlinford avatar Feb 01 '23 09:02 tlinford

I think you are correct, I have not repeated my tests often enough to average out the kind-of-nondeterministic behavior of the allocator. I played around and found no definite signs of leaks when closing panes.

raphCode avatar Feb 01 '23 11:02 raphCode

layout {
    pane size=1 borderless=true {
        plugin location="zellij:tab-bar"
    }
    
    pane split_direction="vertical" {
        pane split_direction="vertical" size="80%"
        pane split_direction="vertical"
    }
    pane split_direction="horizontal" {
        pane split_direction="horizontal" size="80%"
        pane split_direction="vertical"
    }
    
    pane size=2 borderless=true {
        plugin location="zellij:status-bar"
    }
}

this layout causes zellij to rapidly consume all memory on the system, a recoverable condition only because I have 64 gigs of RAM and it takes it a bit to fill that much up. It gets to 20 gigs in seconds.

It is unresponsive SIGINT and SIGQUIT and has to be killed manually quickly

$ stty size
45 167

DianaNites avatar Apr 23 '23 00:04 DianaNites

this layout causes zellij to rapidly consume all memory on the system

moved to https://github.com/zellij-org/zellij/issues/2407

This issue is more about memory usage in sessions that are long running and/or had a lot of tabs / panes / scrollback lines.

raphCode avatar Apr 25 '23 10:04 raphCode

When I have btop running in its own tab, zellij will gobble up 3–5 GB RAM in a matter of a couple of days.

  • zellij: 0.36.0
  • btop: 1.2.13
  • O/S: Linux 6.3 (x86_64)

kseistrup avatar May 20 '23 10:05 kseistrup

This is a comment regarding a sub-thread on HN where possible memory leaking is mentioned.

TL;DR: Zellij v0.37.2 is still leaking memory with btop running in a separate tab.

Setup

In a freshly launched zellij with a fairly default configuration (I believe I changed the mouse value only) I opened 3 tabs:

  1. a working shell
  2. lnav log file navigator (probably irrelevant in this case)
  3. btop

In all three tabs I exec'ed into the running program, so exec btop in tab 3.

To capture zellij's memory consumption I ran ps(1) every 71 seconds:

while :; do
  ps faux \
  | rg '[z]ellij --server' \
  | timestamp \
  | tee -a zellij-memory-leak.txt
  # sleep for 1m 11s
  sleep 71
done

Results

[…]
2023-06-27 19:51:32	kas       894490  2.2  6.1 38396436 125172 ?     Sl   19:40   0:14 /usr/bin/zellij --server /run/user/1000/zellij/0.37.2/fertile-curtain
2023-06-27 19:52:43	kas       894490  2.1  6.3 38412820 128628 ?     Sl   19:40   0:15 /usr/bin/zellij --server /run/user/1000/zellij/0.37.2/fertile-curtain
2023-06-27 19:53:54	kas       894490  1.9  6.4 38412820 130676 ?     Sl   19:40   0:15 /usr/bin/zellij --server /run/user/1000/zellij/0.37.2/fertile-curtain
[…]
2023-06-28 06:16:43	kas       894490  0.6 61.3 40483420 1238556 ?    Sl   Jun27   4:05 /usr/bin/zellij --server /run/user/1000/zellij/0.37.2/fertile-curtain
2023-06-28 06:17:54	kas       894490  0.6 61.4 40483420 1240604 ?    Sl   Jun27   4:06 /usr/bin/zellij --server /run/user/1000/zellij/0.37.2/fertile-curtain
2023-06-28 06:19:05	kas       894490  0.6 62.2 53066496 1257172 ?    Sl   Jun27   4:07 /usr/bin/zellij --server /run/user/1000/zellij/0.37.2/fertile-curtain
[…]
[ btop is killed, and then:]
[…]
2023-06-28 06:21:56	kas       894490  0.6  4.6 13220532 94736 ?      Sl   Jun27   4:10 /usr/bin/zellij --server /run/user/1000/zellij/0.37.2/fertile-curtain

Initially zellij was using ~93 MB RAM, that immediately climbed in roughly 2 kB increments for each loop above. Interestingly, it plateaued at 148 MB after 20 minutes and stayed there for almost half an hour, but then it continued climbing (except for 5 times during the test period, where memory consumption decreased between 1_664 B and 41 kB between two loop rounds).

The loop ran from 2023-06-27 @ 19:40 to 2023-06-28 @ 06:19 (i.e., 10h 39m), and at the end zellij was consuming 1_228 MB RAM (1.2 GB), all of which was released when the btop tab was closed.

Notes

I probably wouldn't notice this on a modern machine with lots of RAM, but on a small VPS with 2 GB RAM or less, zellij will eventually be killed with an OOM error by the system. For that reason I went back to using tmux on on remote machines because zellij would be killed after a day or two since I usually run btop in a separate tab.

Non-standard software used:

PS: The memory I am referring to is the RSS size as reported by ps. The VSZ increased from 36 GB to 51 GB during the test period, and went down to 13 GB when tabs 2 and 3 were closed.

kseistrup avatar Jun 28 '23 07:06 kseistrup

PS:

$ stty size
50 211
$ uname -av
Linux fyhav 6.3.8-arch1-1 #1 SMP PREEMPT_DYNAMIC Wed, 14 Jun 2023 20:10:31 +0000 x86_64 GNU/Linux

The whole thing is running on a 2 GB Linode VPS with 1 CPU running an ArchLinux installation.

I'm not comfortable with attaching zellij's debug logs as they seem to include everything that zellij has seen during its entire runtime.

(I reran the loop in a zellij --debug instance, rather than the zellij used above, but I don't see any difference: it's still eating memory.)

kseistrup avatar Jun 28 '23 08:06 kseistrup

Hey, just to confirm: I'm reproducing this with the btop tab. Thanks for the detailed reproduction. I'll poke around and see what's up hopefully some time this month.

imsnif avatar Jun 28 '23 11:06 imsnif

Alright - so I issued a fix for this in #2675 - it will be released in the next version. Thank you very much @kseistrup for the reproduction.

This issue has become a bit of a grab bag for memory issues, some of them actual issues others symptoms of other issues (eg. things we need to give an error about instead of crashing). So I'm going to close it now after this fix. If anyone is still experiencing memory issues (starting from next release), please open a separate issue and be sure to provide a detailed and minimal reproduction. Thanks everyone!

imsnif avatar Aug 04 '23 08:08 imsnif

I'm afraid the changes made in #2675 hasn't had any effect on the memleak issue when running btop in a separate tab (please notice that there are two instances of zellij running in the logfile below: the vitreous-lemon (PID 2730169) had a tab with btop, as before, while excellent-galaxy (PID 2730826) had a tab with just a tty-based RSS reader):

2023-08-28 10:57:06	kas      2730169  8.4  0.7 50470704 85532 ?      Sl   10:51   0:26  \_ /usr/bin/zellij --server /run/user/1000/zellij/0.38.0/vitreous-lemon
2023-08-28 10:57:06	kas      2730826  1.5  0.4 12687136 56976 ?      Sl   10:52   0:04  \_ /usr/bin/zellij --server /run/user/1000/zellij/0.38.0/excellent-galaxy
2023-08-28 10:58:17	kas      2730169  7.3  0.7 50476792 91100 ?      Sl   10:51   0:28  \_ /usr/bin/zellij --server /run/user/1000/zellij/0.38.0/vitreous-lemon
2023-08-28 10:58:17	kas      2730826  1.5  0.5 12687136 61432 ?      Sl   10:52   0:04  \_ /usr/bin/zellij --server /run/user/1000/zellij/0.38.0/excellent-galaxy
2023-08-28 10:59:28	kas      2730169  6.6  0.8 50478992 93504 ?      Sl   10:51   0:30  \_ /usr/bin/zellij --server /run/user/1000/zellij/0.38.0/vitreous-lemon
2023-08-28 10:59:28	kas      2730826  1.4  0.5 12687136 61432 ?      Sl   10:52   0:05  \_ /usr/bin/zellij --server /run/user/1000/zellij/0.38.0/excellent-galaxy
[…]
2023-08-28 12:16:28	kas      2730169  3.0  2.2 50724728 262328 ?     Sl   10:51   2:34  \_ /usr/bin/zellij --server /run/user/1000/zellij/0.38.0/vitreous-lemon
2023-08-28 12:16:28	kas      2730826  1.2  0.5 12687200 62464 ?      Sl   10:52   1:02  \_ /usr/bin/zellij --server /run/user/1000/zellij/0.38.0/excellent-galaxy
2023-08-28 12:17:39	kas      2730169  3.0  2.2 50724728 264840 ?     Sl   10:51   2:36  \_ /usr/bin/zellij --server /run/user/1000/zellij/0.38.0/vitreous-lemon
2023-08-28 12:17:39	kas      2730826  1.2  0.5 12687200 62464 ?      Sl   10:52   1:03  \_ /usr/bin/zellij --server /run/user/1000/zellij/0.38.0/excellent-galaxy
2023-08-28 12:18:50	kas      2730169  3.0  2.3 50724728 268596 ?     Sl   10:51   2:38  \_ /usr/bin/zellij --server /run/user/1000/zellij/0.38.0/vitreous-lemon
2023-08-28 12:18:50	kas      2730826  1.2  0.5 12687200 62464 ?      Sl   10:52   1:04  \_ /usr/bin/zellij --server /run/user/1000/zellij/0.38.0/excellent-galaxy
[ btop is killed, and then: ]
2023-08-28 12:20:01	kas      2730169  3.0  0.6 37873300 79588 ?      Sl   10:51   2:40  \_ /usr/bin/zellij --server /run/user/1000/zellij/0.38.0/vitreous-lemon
2023-08-28 12:20:01	kas      2730826  1.2  0.5 12687168 62336 ?      Sl   10:52   1:05  \_ /usr/bin/zellij --server /run/user/1000/zellij/0.38.0/excellent-galaxy

I only ran the loop for slightly more than an hour, but the results are clear:

For every loop an average of 2653 bytes is lost (min: 1044, max: 5568), which is at least as much as in v0.37.2, and possibly a little worse.

(The zellij without a btop tab (excellent-galaxy) is largely unaffected. Sometimes the memory consumption goes up with a few bytes, sometimes it goes down.)

Zellij v0.38.0 hasn't reached my Linux distribution yet, so I used the pre-compiled zellij-x86_64-unknown-linux-musl.tar.gz binary from the Releases page here on Microsoft GitHub.

kseistrup avatar Aug 28 '23 10:08 kseistrup

@kseistrup - thanks for giving this a test so quickly. I spent some time on this issue in the past couple of months, and for me the fixes I made solved the problem with the separate btop tab. I'm sorry to hear it didn't for you.

I am growing suspicious that this is caused by the behavior of the rust allocator that might be a bit too greedy about releasing memory here and there. If you are comfortable compiling the repo from source and giving it a try, I'm wondering if you'd be able to reproduce the issue with a different allocator: https://github.com/zellij-org/zellij/compare/main...jemalloc

EDIT: You might want to wait a day until I finish updating CONTRIBUTING.md with new build instructions since we added some stuff. Unless you don't mind the build-tool error-and-install game.

imsnif avatar Aug 28 '23 12:08 imsnif

@imsnif

I'm not sure how a btop tab can be an issue here and not there. Does btop show colours when you run it? I always imagined that zellij blew up because of the gazillions of ANSI colour codes that btop is spewing out. But perhaps I lack imagination…

I also thought I wouldn't be able to compile zellij myself because it needed newer features than ArchLinux' rust version provides, but it seeem I can compile it, and it's only two tiny changes. I'll be back when I know more.

kseistrup avatar Aug 28 '23 14:08 kseistrup

I'm not sure how a btop tab can be an issue here and not there. Does btop show colours when you run it? I always imagined that zellij blew up because of the gazillions of ANSI colour codes that btop is spewing out. But perhaps I lack imagination…

Honestly I'm also baffled. I've been around this area with a very fine comb and the only thing I found to help is the linked fix in the recent version. We keep the colors in a stack allocated structure that's essentially an elaborate "terminal character configuration". It's the same size for each character whether you have styles or not (and it's really not that big).

I thought this was somehow us not dropping (eventually deallocating) the terminal lines, but that also wasn't it. The particularities of this issue across machines and setups are the reason I currently suspect the allocator. Let's see.

imsnif avatar Aug 28 '23 14:08 imsnif

The jemalloc variant started promising: Immediately it climbed from 132 MB til 162 MB and stayed there for almost 20 minutes, so that looked great. But after one hour we're at 226 MB and climbing for every loop. :unamused:

I'll let it run until tomorrow to see if some sort of garbage collection takes place at some point.

kseistrup avatar Aug 28 '23 16:08 kseistrup

PS: I believe I'm running wild in B, kB and MB: ps(1) is showing its values in kB, doesn't it, so when I said earlier that we're losing 2653 bytes per loop, I think I meant 2653 kB. But whatever unit, we're losing memory for every loop (at the rate shown by ps, whichever unit that really is). Sorry for being unclear.

kseistrup avatar Aug 28 '23 16:08 kseistrup

I may have found something - the key seems to be having the tab open but not being rendered!

@kseistrup would you be able to give #2745 a try?

I did a quick test and got:

  • #2745: starting mem: 76248, after ~30min: 77400
  • main: starting mem: 74072, after ~30min: 106596

tlinford avatar Aug 28 '23 19:08 tlinford

@tlinford

I'm not a developer so I hope I got it right, but I believe I have your branch up and running now. I'll be back as soon as I see a trend.

@imsnif

Even the jemalloc appraoch blew up during the night: I woke up to a memory consumption that was 10 times the amount compared to what it was when I went to bed.

kseistrup avatar Aug 29 '23 07:08 kseistrup

TL;DR: The @tlinford branch does seems to act differently, but it is still leaking memory here.

First I thought it had stabilised at 154_632 (whatever unit ps serves), then at 169_196, before it seemed to settle at 247_544 — at which point I was going to report the results. But after more than 2 hours it climbed to 270_236. For unrelated reasons I had to stop the experiment at that time, and when I took down btop the memory consumption immediately dropped to 157_956.

It doesn't seem to matter which tab is active: in stabler periods it doesn't change anything to switch away from btop and in more unstable periods it doesn't change anything to switch to btop. In short, I can only reliably correlate memory changes to running btop or not.

I hope you guys can make more sense of it than I can.

kseistrup avatar Aug 29 '23 13:08 kseistrup

Thanks for checking it out! I'll keep investigating :)

tlinford avatar Aug 29 '23 15:08 tlinford

It's really a strange bug. I have been monitoring another instance (with your branch) since I wrote my previous reply: it rose rather fast to 303908 and has stayed there since. I'm running the same tabs each time, so I find it very strange that the behaviour can be so different… Computer programs are never creative, so there must be something that triggers the leak.

kseistrup avatar Aug 29 '23 15:08 kseistrup

Computer programs are never creative, so there must be something that triggers the leak.

This is what made me suspect the allocator, which in Rust's case doesn't immediately free memory when the relevant data is dropped, and so can be particular to the whole state of its memory space (the screen thread in this case). But it's really just guessing. We're chipping away at this (I think @tlinford found a really interesting leak in the output buffer) and trust him to find whatever can be found here.

We might eventually "explain" this away, but I think it's a good idea to try and track down every stray byte we cannot explain at the very least. Thank you very much for helping us out @kseistrup !

imsnif avatar Aug 29 '23 16:08 imsnif