qubes-issues
qubes-issues copied to clipboard
Excessive swapping or limited swap space making VMs unresponsive
The problem you're addressing (if any) 2 problems:
- Going to a VM that has been running for a while, and have to wait 30+ seconds to get a response because it started swapping while you weren't using it.
- Going to a VM that has been running for a while, and it is totally unresponsive because it started swapping then filled swap while you weren't using it.
Describe the solution you'd like
When clicking the qubes "Q" icon, it shows the memory used for each VM from a memory balancing perspective. One could put a memory number there.
Exactly what number would be best to put there is not entirely clear, as memory stats as displayed in the "free -h" command could be used to show free memory inside the VM devided by available memory inside the VM.
However, due to memory balancing, this is not the whole story. Perhaps free memory inside the VM (as can be displayed by the "free" command) devided by the max memory the memory balancer is willing to provide it?
Where is the value to a user, and who might that user be? Not having to wait 30+ seconds for responces, and not having to loose data that you hadn't gotten off a disposable VM yet because that VM it is entirely unresponsive.
Describe alternatives you've considered Normally one would put a system monitor applet in the task bar that shows memory consumption. However, it is unlikely this is a usable solution for qubes because there would end up too many monitors in the task bar and for security reasons
Additional context
Relevant documentation you've consulted
Related, non-duplicate issues
When clicking the qubes "Q" icon, it shows the memory used for each VM from a memory balancing perspective. One could put a memory number there.
How is this different from the memory values the Qube Manager already shows for each VM?
I have personally never experienced this problem, nor can I recall hearing others report it. Is it because Qubes is installed on an HDD rather than an SSD?
In any case, implementing a tool for users to manually monitor the problem sounds much less promising than finding the underlying root cause and fixing it.
The underlying cause is opening "too many" firefox tabs in a VM because you don't know how many is "too many" (I.E. how much free memory is left inside the machine). The same goes for "too many" applications running in a VM but firefox (and especially torbrowser) are more likely to bloat and become unresponsive while you are away.
Increasing the amount of memory does not help because it just increases the number of tabs (or applications) you can have open, the user still doesn't know when they need to stop.
@ddevz We actually did propose some of what you're suggesting, in two prototypes for a new App Menu custom built for Qubes, tracked in #5677. In the survey to gauge user sentiment regarding feature options (#6573), this feature measured very well with over 2/3 of users responding positively to it. It's not prioritized for an initial release, though, as our goal for that release is to just get the fully-new widget onto users machines for feedback from use.
The data for what those prototypes propose, only mirrors what the existing Domains widget (In 4.0 it's the Q menu on the right part of the screen, in the Tray area) provides. Do those existing metrics provide the insight you're looking for (but obviously presented more contextually so you have more natural visibility into it)?
The data for what those prototypes propose, only mirrors what the existing Domains widget (In 4.0 it's the Q menu on the right part of the screen, in the Tray area) provides. Do those existing metrics provide the insight you're looking for (but obviously presented more contextually so you have more natural visibility into it)?
If I understand your question properly, then no. Those metrics do not provide sufficient insight. The 4.0 "Q" widget on the right shows how much memory the memory balancer has allocated to each VM. This is valuable information as it's part of the question "how many more VMs can I create before I cant start VMs anymore (due to being out of memory)?" and "which VMs do I need to kill to free up enough memory to start the VM I want to start?".
In this issue I'm trying to answer questions like "how many more tabs can I open in this specific VM before the machine starts swapping/runs out of swap?"
So the metrics one would need are:
- The max memory that the memory balancer is willing to give that VM. A example of this metric would be if instead of the "Q" widget saying "400 MB" for a VM it could say "400 of 3983 MB" or "400/3983 MB".
Notes:
- this information is also useful for people trying to manage the available xen memory as well
- I suspect this information would be easy to get to the GUI
- This information does not appear to be available inside the virtual machine, as running the "free" command inside the virtual machine shows the "total memory" changing over time as the memory balancer does its balancing.
- The amount of memory actually used inside that VM, (or something that could be used to compute the amount of memory used like the "free" and "total" numbers). (Note:I suspect this information would be harder to get to the gui)
I have personally never experienced this problem, nor can I recall hearing others report it. Is it because Qubes is installed on an HDD rather than an SSD?
I've run into this quite often on SSD in VMs set up primarily for research that tends to require a lot of open tabs. If I also have a few other VMs open, memory pressure can lead to that VM being stuck in perma-swap and unresponsive. It would be nice, particularly for browser heavy VMs based on Linux templates, to be able to control the (max) size of the volatile volume and the swap size from the qubes settings, and have the VM boot process utilize those values.
[Above is responsive to your question...but stopping there to return to topic.]
It would be nice, particularly for browser heavy VMs based on Linux templates, to be able to control the (max) size of the volatile volume and the swap size from the qubes settings, and have the VM boot process utilize those values.
If I'm understanding correctly, it sounds like we have two different (non-mutually-exclusive) proposals that both aim to address the same underlying problem: the "memory monitoring" approach and the "swap config" approach.
I do not see how "swap config" would solve the problem. Can you elaborate how one might use swap config to resolve the issue? I'm totally speculating here... maybe your thinking of setting swap to zero so the process dies instead of the VM becoming unresponsive?
I do not see how "swap config" would solve the problem. Can you elaborate how one might use swap config to resolve the issue? I'm totally speculating here... maybe your thinking of setting swap to zero so the process dies instead of the VM becoming unresponsive?
It appears that in the Qubes standard Linux templates, all swap is hard coded to "up to 1GB" per VM on the volatile volume.
This is different than traditional settings for Linux installs, e.g. from the Fedora 28 guide, I see:

So, in the above, if I have a VM set up with 8GB of RAM (whether 8/8 without memory sharing or 0.8/8 with), the traditional swap size suggested, were it not a VM, is between 4GB and 12GB of swap. Not 1GB.
I'll stop digging there because what is really needed is user behavior-focused testing of the interactions between Xen, memory sharing, and swap, for image/memory hungry applications such as Firefox to reach the right balance of settings, esp. under system-wide memory pressure.
The lack of user-exposed local swap values in the VM settings makes experimentation more difficult. I suppose we could perform some initial testing with swap files on the private volumes in the short term (with caveats).
Also, my theory, which may be wrong, is that memory pressure, with limited swap space, is causing the issue. This is different than your theory which is that swapping itself is causing the issue.
B
Also, my theory, which may be wrong, is that memory pressure, with limited swap space, is causing the issue. This is different than your theory which is that swapping itself is causing the issue.
I understand now! my theory is that swapping itself is causing the "have to wait 30+ seconds to get a response" problem, and that running out of swap space is causing the "totally unresponsive (I.E. can't use the terminal anymore and have to kill the VM)" problem.
If I am correct, then your idea of expanding the swap space could fix the "totally unresponsive" problem, but not the "have to wait 30+ seconds" problem.
(and note that when having to wait 30+ seconds, a user won't be able to stand opening many more tabs and launching many more applications)
If running out of swap space is causing both problem 1 and 2, then expanding the swap space would fix neither as the user would continue to open tabs/launch applications until the same thing happened.
However, being able to expand the swap space does sound important for users with limited overall system memory (I.E. the total memory that xen gets to use), when that user wants to run VMs that will go beyond what their system memory can handle. (I added extra memory to my system and had forgotten about this case :) )
So to me it sounds like we solve separate (but related) problems that lead to the same symptoms.
Adding more swap to a VM that runs Firefox avoids also "have to wait 30+ seconds" in most cases. This helps for Firefox specifically, because a lot of memory allocated by it is never released (aka memory leaks). While ideally applications wouldn't leak that much memory (which is hard, in such complex thing like a web browser...), adding more swap helps to mitigate the issue. Or at least significantly delay it.
Each VM has a 10GB volatile volume (/dev/xvdc). By default, only 1GB is assigned for swap, and the rest is unused. It is reserved for a copy-on-write layer over read-only root filesystem, but nowadays VM sees read-write root filesytem (the copy-on-write layer is applied at dom0 level). We might use it again, though, as part of #1293 and #904. Until then, there is /dev/xvdc3, which is the unused space - you can easily do sudo mkswap /dev/xvdc3 && sudo swapon /dev/xvdc3 to use it for swap. Maybe add it to /rw/config/rc.local. If not the plan to start using most of volatile volume again, we could have an option in VM settings for that.
A word of caution: using big swap does help for Firefox, which leaks a lot of memory. But it helps only because most of that memory is not accessed anymore. If an application really uses a lot of memory, then having more swap will actually make the situation worse - if more swap is used (and accessed frequently), it will make the application even slower, resulting in unresponsiveness much longer than 30s.
Adding more swap to a VM that runs Firefox avoids also "have to wait 30+ seconds" in most cases. This helps for Firefox specifically, because a lot of memory allocated by it is never released (aka memory leaks). While ideally applications wouldn't leak that much memory (which is hard, in such complex thing like a web browser...), adding more swap helps to mitigate the issue. Or at least significantly delay it.
I thought Firefox had fixed its leaks many years ago :disappointed:.
I thought Firefox had fixed its leaks many years ago 😞.
I believe Firefox did fix memory leaks (at least majority). Maybe because they partly migrated to a proper language (Rust) that prevents majority of memory-related issues, including de-allocations.
As for today, Firefox consumes way less memory than Chromium/Chrome for having the same amount of tabs loaded and etc. On a normal PC with GNU/Linux or Windows.
Maybe the problem is actually that Firefox allocates too much memory because it sees that a lot of memory is available (to be faster and utilize this extra free memory), and does not free it aggressively as it may be reused by itself.
I had been assuming that the memory leaks were in the javascript of the various pages you have open, as it seems highly dependent as to what sites you have open.
This issue is being closed because:
- This issue is on the "Release 4.0 updates" milestone.
- Qubes OS 4.0 reached EOL (end-of-life) over one year ago.
- There has not been any activity on this issue in over one year.
If anyone believes that this issue should be reopened and reassigned to an active milestone, please leave a brief comment. (For example, if a bug still affects Qubes OS 4.1, then the comment "Affects 4.1" will suffice.)
This still is a problem, and it happens quite a lot. Freezing the VM to death if you do not check for a restart of Firefox and free space. swapoff does not help, as VM still can get unresponsive over time. Please advise.
This still is a problem, and it happens quite a lot. Freezing the VM to death if you do not check for a restart of Firefox and free space. swapoff does not help, as VM still can get unresponsive over time. Please advise.
Am using a relatively recent install of the 4.2.2 image. PC is limited with 16 GB RAM, with all the rest of active VM's I tried to make it work with 2,1GB of RAM. The VM's that get stuck all have 30-50 tabs open without Java active. Turning off swap helped insofar that you would notice running into the bottleneck by response time and could end firefox before, but that does not always work -> when it happens to disposable VM's you loose your work, since no option to restart. With all the VM's I usually uncheck: "include in memory balancing", which probably is a factor. Same use-case on a live systems (porteus) with 8GB RAM / no swap, there it appears to be never an issue, but that much I can not assign.
With all the VM's I usually uncheck: "include in memory balancing"
Do you also adjust their memory size ("initial memory")? The default for initial memory is 400MB which is definitely not enough to run Firefox or similar, and without memory balancing, VM will not get more RAM.
With all the VM's I usually uncheck: "include in memory balancing"
Do you also adjust their memory size ("initial memory")? The default for initial memory is 400MB which is definitely not enough to run Firefox or similar, and without memory balancing, VM will not get more RAM.
Yes. The VM's that are the problem for me are Disposable VM's, with the last attempt I set "initial memory" to 2100 MB in the "default-dvm" with the hope it will do. Kind of a habit to uncheck "include in memory balancing" .. only really thinking of it now .. would have to see how it turns out, but since the RAM is rather limited for me and using 10+ VM's of different RAM sizes at the same time, chance is that the swap space likely still fills up over time, which was the original issue.
Well, swap will fill up if VM has less RAM than applications running inside try to use. Yes, web browsers are extremely memory hungry apps. Check top to see what is using how much memory.
I've run into this quite often on SSD in VMs set up primarily for research that tends to require a lot of open tabs. If I also have a few other VMs open, memory pressure can lead to that VM being stuck in perma-swap and unresponsive. It would be nice, particularly for browser heavy VMs based on Linux templates, to be able to control the (max) size of the volatile volume and the swap size from the qubes settings, and have the VM boot process utilize those values.
Also, my theory, which may be wrong, is that memory pressure, with limited swap space, is causing the issue. This is different than your theory which is that swapping itself is causing the issue.
One of the questions was if it can be helped with adjustable swap space, since others have encountered it (older qubes versions but same issue) all with -presumably- "include in memory balancing" enabled - I feel that 1GB of swap may indeed be to little on some VM's (Firefox use) and you would run into it over along enough period of time when you are not able to give it 4-8 GB of RAM -> but with a more swap it could do.
Using your solution from on Jun 2, 2021 - will that always be possible in the future? /dev/xvdc is 12 GiB now but just executing it seems to work fine and it shows in top/htop.
On the chance that you might know how to do bash scripting: it's possible to put a script in cron in the qube that you are worried is going to run out of memory, set it to run every 5 min and notify you when its getting low by using something like:
notify-send --expire-time=360000 'RUNNING OUT OF MEMORY IN VM'
you can get your free memory in the qube with something like:
free -m | grep '^Mem' | awk '{print $7}'
@horizon2021: Please try resetting all initial and max memory settings to default (i.e., 400 MB / 4000 MB) and enabling "include in memory balancing" on all qubes where it's possible. (In other words, put all memory-related settings back to the defaults.) Then, reboot the whole system and trying using it for a while to see if the problem persists.
On the chance that you might know how to do bash scripting:
Thanks. With the tutorials out there I probably could get this done; right now (whatever the security concerns are) just creating a new swap file seems easiest solution to me.
sudo swapoff -a
sudo fallocate -l 4G /swapfile
sudo chmod 0600 /swapfile
sudo mkswap /swapfile
sudo swapon /swapfile
@horizon2021: Please try resetting all initial and max memory settings to default (i.e., 400 MB / 4000 MB) and enabling "include in memory balancing" on all qubes where it's possible. (In other words, put all memory-related settings back to the defaults.) Then, reboot the whole system and trying using it for a while to see if the problem persists.
Will do a reinstall with the current version and get back to you. Please give it a while.
Will do a reinstall with the current version and get back to you. Please give it a while.
FWIW, I don't think a complete reinstall is necessary for the the settings I mentioned. In fact, if you back up all of your qubes first and restore them into the new installation, they may still have the old memory settings, so you may still have to change them manually anyway.
Alright, just a reboot. It is 11 active VM's in total, two HVM's 400 MB Ram (USB, Network), rest all set to default 400 - 4000. I recreated all VM's but sys-USB and the one that I use Telegram-desktop in. There are three disposable VM's with more or less intense (tabs,java) Firefox use.
All booted up fine but there was a noticeable delay in some disp. VM's in the beginning that was unexpected, switching tabs, writing something down in editor all took some delay (this is usually not happening with "fresh VM's" that have just been started to be used), it however did go away after a while and all seemed fine.
At one point I killed one disposable VM and started a new one, that one was really slow, due to RAM limitation. You can with this default memory setting, make memory available from elsewhere by closing application in another VM. Which helped here.
At last I was running into the situation with the swap filling up in one VM to it's limit again, however without total crash, only unresponsive for maybe 20-30 seconds, which was the VM that runs the Telegram app. It seemed to do ok until starting a video, this is the earliest copy+paste I could do after the problem occurred:
top - 23:14:28 up 22:40, 2 users, load average: 1.31, 2.05, 1.18
Tasks: 174 total, 3 running, 171 sleeping, 0 stopped, 0 zombie
%Cpu(s): 4.0 us, 3.4 sy, 0.0 ni, 62.9 id, 26.4 wa, 1.4 hi, 0.2 si, 1.8 st
MiB Mem : 684.7 total, 17.8 free, 658.7 used, 51.9 buff/cache
MiB Swap: 1024.0 total, 0.2 free, 1023.8 used. 26.0 avail Mem
memory usage due to telegram seems seems to vary a good bit, saw this application run fine below 1000 MB Ram with very little swap for a while, but it can also go up more, right now:
MiB Mem : 3230.9 total, 1387.6 free, 1419.4 used, 488.3 buff/cache
MiB Swap: 1024.0 total, 333.8 free, 690.2 used. 1811.5 avail Mem
Also sys-whonix seems to take more than I would expect:
MiB Mem : 1560.2 total, 269.9 free, 1092.7 used, 229.1 buff/cache
MiB Swap: 1024.0 total, 887.0 free, 137.0 used. 467.5 avail Mem
So the issue, as far as I initially reported, can be helped with just using the default (none became totally unresponsive). Memory limitation may still problematic since (the point has been made above) you do not immediately know when you are running into the system limit.
I thought Firefox had fixed its leaks many years ago 😞.
Firefox also likes to hold memory unused but reserved, it can be made to return it in about:unloads and about:processes or about:performance.
This is a distinct problem from leaks in the browser itself and leaks in Javascript applications (Youtube most notably).