odroid_u3: problem: Out of Memory for both transmission-daemon and qbittorrent
So I have been trying out the new Odroid 22.04 image (kernel 5.19.1 and Kernel 5.18.1) as a 24/7 torrent box. Sad to say that for both of these programs, they stop abruptly with syslog messages like the following
Aug 21 01:19:51 odroid kernel: [91982.065450] oom-kill:constraint=CONSTRAINT_NONE,nodemask=(null),cpuset=/,mems_allowed=0,global_oom,task_memcg=/user.slice/user-1001.slice/session-c1.scope,task=qbittorrent,pid=2060,uid=1001 Aug 21 01:19:51 odroid kernel: [91982.083483] Out of memory: Killed process 2060 (qbittorrent) total-vm:507656kB, anon-rss:139364kB, file-rss:82688kB, shmem-rss:2932kB, UID:1001 pgtables:384kB oom_score_adj:0 Aug 21 01:19:51 odroid systemd[1]: session-c1.scope: A process of this unit has been killed by the OOM killer.
I run them one at a time (not concurrently) to see whether they interfere with each other, but the same messages appear. Qbitorrrent is the latest version while transmission-daemon was installed from the nightly a few days ago, so they should be very recent versions. When they stop, the client would have maybe 12-18 torrents running at the same time.
I have run the same number of torrents in my XU4 with the latest image without issues. And even then the transmission client was baked in which from what I can read, have lots of memory leak issues (but does not appear to crash in the XU4).
I have also tried the same torrents and clients on the U3 with Ubuntu 20.04 which had been upgraded from Ubuntu 16 to Ubuntu 20 using the do-release-upgrade command - on Kernel 4.16.0-v7. So far, they have not crashed with out of memory
@sirzur - you may try to disable zswap and/or mglru - there should be comments on how to enable/disable them in /etc/rc.local both times via sysfs - please let me know if it helps and turning off which of the two helped ... i think mglru modifies the oom killer to keep the system responsive under very high memory pressure and prefers to kill large processes before the system gets unuseable ... also i think ubuntu 22.04 enabled some oom killer as well which made some trouble in the past - not sure if this is the case (i.e. it being around at all) for the arm version too
Ok. More information. I have disabled both zswap and mglru. But now I have a freezing problem.
To disable the zswap, I changed the following line in /etc/rc.local
echo 0 > /sys/module/zswap/parameters/enabled
To disable mglru, I installed a useful utility called mg-lru-helper from github and ran the recommended command to disable mglru and confirmed disabled.
After disabling both zswap and mglru, I ran qbittorrent and saw the following messages in both syslog and kern.log. The messages consistently appear after reboot and just after starting qbittorrent, which subsequently freezes almost immediately after launching. Googling the circular locking dependency, it looks like a possible kernel issue, but I am not knowledgeable enough to fix. Any ideas?
Error messages as follows
Aug 22 14:31:02 odroid kernel: [ 232.743082] Aug 22 14:31:02 odroid kernel: [ 232.743126] ====================================================== Aug 22 14:31:02 odroid kernel: [ 232.745112] WARNING: possible circular locking dependency detected Aug 22 14:31:02 odroid kernel: [ 232.751280] 5.19.1-stb-exy+ #2 Not tainted Aug 22 14:31:02 odroid kernel: [ 232.755353] ------------------------------------------------------ Aug 22 14:31:02 odroid kernel: [ 232.761511] kswapd0/53 is trying to acquire lock: Aug 22 14:31:02 odroid kernel: [ 232.766198] c150ba1c (regulator_list_mutex){+.+.}-{3:3}, at: regulator_lock_dependent+0x44/0x2a4 Aug 22 14:31:02 odroid kernel: [ 232.774964] Aug 22 14:31:02 odroid kernel: [ 232.774964] but task is already holding lock: Aug 22 14:31:02 odroid kernel: [ 232.780780] c149bde4 (fs_reclaim){+.+.}-{0:0}, at: kswapd+0x220/0x1020 Aug 22 14:31:02 odroid kernel: [ 232.787290] Aug 22 14:31:02 odroid kernel: [ 232.787290] which lock already depends on the new lock. Aug 22 14:31:02 odroid kernel: [ 232.787290] Aug 22 14:31:02 odroid kernel: [ 232.795449] Aug 22 14:31:02 odroid kernel: [ 232.795449] the existing dependency chain (in reverse order) is: Aug 22 14:31:02 odroid kernel: [ 232.802914] Aug 22 14:31:02 odroid kernel: [ 232.802914] -> #2 (fs_reclaim){+.+.}-{0:0}: Aug 22 14:31:02 odroid kernel: [ 232.808557] kmem_cache_alloc_lru+0x40/0x5f4 Aug 22 14:31:02 odroid kernel: [ 232.813330] __d_alloc+0x2c/0x1f8 Aug 22 14:31:02 odroid kernel: [ 232.817150] d_alloc_parallel+0x58/0xa5c Aug 22 14:31:02 odroid kernel: [ 232.821576] __lookup_slow+0x94/0x184 Aug 22 14:31:02 odroid kernel: [ 232.825742] lookup_one_len+0xa4/0xe4 Aug 22 14:31:02 odroid kernel: [ 232.829909] start_creating+0xb4/0x170 Aug 22 14:31:02 odroid kernel: [ 232.834162] debugfs_create_dir+0x10/0x130 Aug 22 14:31:02 odroid kernel: [ 232.838762] pinctrl_init+0x2c/0xd4 Aug 22 14:31:02 odroid kernel: [ 232.842755] do_one_initcall+0x70/0x3a8 Aug 22 14:31:02 odroid kernel: [ 232.847096] kernel_init_freeable+0x2ac/0x314 Aug 22 14:31:02 odroid kernel: [ 232.851956] kernel_init+0x18/0x12c Aug 22 14:31:02 odroid kernel: [ 232.855949] ret_from_fork+0x14/0x2c Aug 22 14:31:02 odroid kernel: [ 232.860028] 0x0 Aug 22 14:31:02 odroid kernel: [ 232.862372] Aug 22 14:31:02 odroid kernel: [ 232.862372] -> #1 (&sb->s_type->i_mutex_key#2){++++}-{3:3}: Aug 22 14:31:02 odroid kernel: [ 232.869403] simple_recursive_removal+0x80/0x35c Aug 22 14:31:02 odroid kernel: [ 232.874524] debugfs_remove+0x38/0x4c Aug 22 14:31:02 odroid kernel: [ 232.878690] _regulator_put.part.0+0x34/0x1b8 Aug 22 14:31:02 odroid kernel: [ 232.883551] regulator_put+0x2c/0x3c Aug 22 14:31:02 odroid kernel: [ 232.887631] dev_pm_opp_put_regulators+0x70/0xf4 Aug 22 14:31:02 odroid kernel: [ 232.892752] exynos_bus_probe+0x23c/0x680 Aug 22 14:31:02 odroid kernel: [ 232.897265] platform_probe+0x5c/0xb8 Aug 22 14:31:02 odroid kernel: [ 232.901432] really_probe+0x174/0x404 Aug 22 14:31:02 odroid kernel: [ 232.905599] __driver_probe_device+0xa0/0x204 Aug 22 14:31:02 odroid kernel: [ 232.910459] driver_probe_device+0x34/0xc4 Aug 22 14:31:02 odroid kernel: [ 232.915059] __driver_attach+0xf0/0x1e4 Aug 22 14:31:02 odroid kernel: [ 232.919399] bus_for_each_dev+0x74/0xc0 Aug 22 14:31:02 odroid kernel: [ 232.923739] bus_add_driver+0x174/0x218 Aug 22 14:31:02 odroid kernel: [ 232.928079] driver_register+0x88/0x11c Aug 22 14:31:02 odroid kernel: [ 232.932419] do_one_initcall+0x70/0x3a8 Aug 22 14:31:02 odroid kernel: [ 232.936760] kernel_init_freeable+0x2ac/0x314 Aug 22 14:31:02 odroid kernel: [ 232.941620] kernel_init+0x18/0x12c Aug 22 14:31:02 odroid kernel: [ 232.945613] ret_from_fork+0x14/0x2c Aug 22 14:31:02 odroid kernel: [ 232.949692] 0x0 Aug 22 14:31:02 odroid kernel: [ 232.952037] Aug 22 14:31:02 odroid kernel: [ 232.952037] -> #0 (regulator_list_mutex){+.+.}-{3:3}: Aug 22 14:31:02 odroid kernel: [ 232.958546] lock_acquire+0x128/0x3e8 Aug 22 14:31:02 odroid kernel: [ 232.962713] __mutex_lock+0x98/0x958 Aug 22 14:31:02 odroid kernel: [ 232.966792] mutex_lock_nested+0x1c/0x24 Aug 22 14:31:02 odroid kernel: [ 232.971219] regulator_lock_dependent+0x44/0x2a4 Aug 22 14:31:02 odroid kernel: [ 232.976340] regulator_set_voltage+0x30/0x84 Aug 22 14:31:02 odroid kernel: [ 232.981114] mmc_regulator_set_ocr+0x44/0xdc Aug 22 14:31:02 odroid kernel: [ 232.985888] sdhci_set_power+0x28/0x60 Aug 22 14:31:02 odroid kernel: [ 232.990142] sdhci_set_ios+0x3c8/0x454 Aug 22 14:31:02 odroid kernel: [ 232.994395] sdhci_runtime_resume_host+0x88/0x1bc Aug 22 14:31:02 odroid kernel: [ 232.999603] __rpm_callback+0x3c/0x108 Aug 22 14:31:02 odroid kernel: [ 233.003856] rpm_callback+0x28/0x54 Aug 22 14:31:02 odroid kernel: [ 233.007850] rpm_resume+0x578/0x798 Aug 22 14:31:02 odroid kernel: [ 233.011842] __pm_runtime_resume+0x48/0xa0 Aug 22 14:31:02 odroid kernel: [ 233.016442] __mmc_claim_host+0x1b8/0x214 Aug 22 14:31:02 odroid kernel: [ 233.020955] mmc_mq_queue_rq+0x248/0x250 Aug 22 14:31:02 odroid kernel: [ 233.025382] __blk_mq_try_issue_directly+0x164/0x1b0 Aug 22 14:31:02 odroid kernel: [ 233.030851] blk_mq_plug_issue_direct.constprop.0+0xe0/0x52c Aug 22 14:31:02 odroid kernel: [ 233.037013] blk_mq_flush_plug_list+0x3f0/0x64c Aug 22 14:31:02 odroid kernel: [ 233.042048] __blk_flush_plug+0xd8/0x130 Aug 22 14:31:02 odroid kernel: [ 233.046475] blk_finish_plug+0x1c/0x28 Aug 22 14:31:02 odroid kernel: [ 233.050728] shrink_lruvec+0x740/0xf84 Aug 22 14:31:02 odroid kernel: [ 233.054981] shrink_node+0x1c0/0x7a0 Aug 22 14:31:02 odroid kernel: [ 233.059061] kswapd+0x50c/0x1020 Aug 22 14:31:02 odroid kernel: [ 233.062793] kthread+0xf4/0x128 Aug 22 14:31:02 odroid kernel: [ 233.066439] ret_from_fork+0x14/0x2c Aug 22 14:31:02 odroid kernel: [ 233.070518] 0x0 Aug 22 14:31:02 odroid kernel: [ 233.072862] Aug 22 14:31:02 odroid kernel: [ 233.072862] other info that might help us debug this: Aug 22 14:31:02 odroid kernel: [ 233.072862] Aug 22 14:31:02 odroid kernel: [ 233.080848] Chain exists of: Aug 22 14:31:02 odroid kernel: [ 233.080848] regulator_list_mutex --> &sb->s_type->i_mutex_key#2 --> fs_reclaim Aug 22 14:31:02 odroid kernel: [ 233.080848] Aug 22 14:31:02 odroid kernel: [ 233.092565] Possible unsafe locking scenario: Aug 22 14:31:02 odroid kernel: [ 233.092565] Aug 22 14:31:02 odroid kernel: [ 233.098468] CPU0 CPU1 Aug 22 14:31:02 odroid kernel: [ 233.102982] ---- ---- Aug 22 14:31:02 odroid kernel: [ 233.107495] lock(fs_reclaim); Aug 22 14:31:02 odroid kernel: [ 233.110620] lock(&sb->s_type->i_mutex_key#2); Aug 22 14:31:02 odroid kernel: [ 233.117651] lock(fs_reclaim); Aug 22 14:31:02 odroid kernel: [ 233.123293] lock(regulator_list_mutex); Aug 22 14:31:02 odroid kernel: [ 233.127286] Aug 22 14:31:02 odroid kernel: [ 233.127286] *** DEADLOCK *** Aug 22 14:31:02 odroid kernel: [ 233.127286] Aug 22 14:31:02 odroid kernel: [ 233.133188] 2 locks held by kswapd0/53: Aug 22 14:31:02 odroid kernel: [ 233.137007] #0: c149bde4 (fs_reclaim){+.+.}-{0:0}, at: kswapd+0x220/0x1020 Aug 22 14:31:02 odroid kernel: [ 233.143951] #1: c4528704 (q->srcu){....}-{0:0}, at: blk_mq_flush_plug_list+0x39c/0x64c Aug 22 14:31:02 odroid kernel: [ 233.151936] Aug 22 14:31:02 odroid kernel: [ 233.151936] stack backtrace: Aug 22 14:31:02 odroid kernel: [ 233.156278] CPU: 1 PID: 53 Comm: kswapd0 Not tainted 5.19.1-stb-exy+ #2 Aug 22 14:31:02 odroid kernel: [ 233.162874] Hardware name: Samsung Exynos (Flattened Device Tree) Aug 22 14:31:02 odroid kernel: [ 233.168955] unwind_backtrace from show_stack+0x10/0x14 Aug 22 14:31:02 odroid kernel: [ 233.174159] show_stack from dump_stack_lvl+0x58/0x70 Aug 22 14:31:02 odroid kernel: [ 233.179195] dump_stack_lvl from check_noncircular+0xfc/0x168 Aug 22 14:31:02 odroid kernel: [ 233.184923] check_noncircular from __lock_acquire+0x1740/0x31bc Aug 22 14:31:02 odroid kernel: [ 233.190911] __lock_acquire from lock_acquire+0x128/0x3e8 Aug 22 14:31:02 odroid kernel: [ 233.196293] lock_acquire from __mutex_lock+0x98/0x958 Aug 22 14:31:02 odroid kernel: [ 233.201414] __mutex_lock from mutex_lock_nested+0x1c/0x24 Aug 22 14:31:02 odroid kernel: [ 233.206881] mutex_lock_nested from regulator_lock_dependent+0x44/0x2a4 Aug 22 14:31:02 odroid kernel: [ 233.213481] regulator_lock_dependent from regulator_set_voltage+0x30/0x84 Aug 22 14:31:02 odroid kernel: [ 233.220336] regulator_set_voltage from mmc_regulator_set_ocr+0x44/0xdc Aug 22 14:31:02 odroid kernel: [ 233.226933] mmc_regulator_set_ocr from sdhci_set_power+0x28/0x60 Aug 22 14:31:02 odroid kernel: [ 233.233010] sdhci_set_power from sdhci_set_ios+0x3c8/0x454 Aug 22 14:31:02 odroid kernel: [ 233.238564] sdhci_set_ios from sdhci_runtime_resume_host+0x88/0x1bc Aug 22 14:31:02 odroid kernel: [ 233.244901] sdhci_runtime_resume_host from __rpm_callback+0x3c/0x108 Aug 22 14:31:02 odroid kernel: [ 233.251325] __rpm_callback from rpm_callback+0x28/0x54 Aug 22 14:31:02 odroid kernel: [ 233.256531] rpm_callback from rpm_resume+0x578/0x798 Aug 22 14:31:02 odroid kernel: [ 233.261565] rpm_resume from __pm_runtime_resume+0x48/0xa0 Aug 22 14:31:02 odroid kernel: [ 233.267034] __pm_runtime_resume from __mmc_claim_host+0x1b8/0x214 Aug 22 14:31:02 odroid kernel: [ 233.273198] __mmc_claim_host from mmc_mq_queue_rq+0x248/0x250 Aug 22 14:31:02 odroid kernel: [ 233.279013] mmc_mq_queue_rq from __blk_mq_try_issue_directly+0x164/0x1b0 Aug 22 14:31:02 odroid kernel: [ 233.285784] __blk_mq_try_issue_directly from blk_mq_plug_issue_direct.constprop.0+0xe0/0x52c Aug 22 14:31:02 odroid kernel: [ 233.294291] blk_mq_plug_issue_direct.constprop.0 from blk_mq_flush_plug_list+0x3f0/0x64c Aug 22 14:31:02 odroid kernel: [ 233.302449] blk_mq_flush_plug_list from __blk_flush_plug+0xd8/0x130 Aug 22 14:31:02 odroid kernel: [ 233.308785] __blk_flush_plug from blk_finish_plug+0x1c/0x28 Aug 22 14:31:02 odroid kernel: [ 233.314426] blk_finish_plug from shrink_lruvec+0x740/0xf84 Aug 22 14:31:02 odroid kernel: [ 233.319984] shrink_lruvec from shrink_node+0x1c0/0x7a0 Aug 22 14:31:02 odroid kernel: [ 233.325190] shrink_node from kswapd+0x50c/0x1020 Aug 22 14:31:02 odroid kernel: [ 233.329877] kswapd from kthread+0xf4/0x128 Aug 22 14:31:02 odroid kernel: [ 233.334044] kthread from ret_from_fork+0x14/0x2c Aug 22 14:31:02 odroid kernel: [ 233.338731] Exception stack(0xf0b31fb0 to 0xf0b31ff8) Aug 22 14:31:02 odroid kernel: [ 233.343768] 1fa0: 00000000 00000000 00000000 00000000 Aug 22 14:31:02 odroid kernel: [ 233.351926] 1fc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 Aug 22 14:31:02 odroid kernel: [ 233.360085] 1fe0: 00000000 00000000 00000000 00000000 00000013 00000000
@sirzur - maybe you put too much memory pressure on that little odroid or maybe you hit some hidden bug in the kernel somewhre i would guess - some things come to my mind:
- does it also happen if you increase the swapsize? - /scripts/recreate-swapfile.sh 2G
- does it also happen with an older kernel? - like: https://github.com/hexdump0815/linux-mainline-and-mali-generic-stable-kernel/releases/tag/5.15.22-stb-exy%2B or https://github.com/hexdump0815/linux-mainline-and-mali-generic-stable-kernel/releases/tag/5.10.45-stb-exy%2B or https://github.com/hexdump0815/linux-mainline-and-mali-generic-stable-kernel/releases/tag/5.4.58-stb-exy%2B
Thanks for your suggestions and help. So I have been doing a whole lot of troubleshooting today, so much so I can't keep track of what causes what in the end. But for your suggestions
- increasing swap size - does not help, but I learnt a new way to increase swap size
- older kernels - the 5.4.58 does not boot. The other two does but does not seem to help alleviate the issue
I then tried unsuccessfully to compile older versions of qbittorrent, to try to elimate the actual program as being the issue. Because I was not successful I moved forward to the nightly and was successful to install a nightly. Marginal improvement. I uninstalled that and installed headless, to see whether the gui was taking up memory and causing the issue. It was when I was running the headless qbittorrent that I noted an issue that it could not process files with non-ascii characters. The torrents would errored up and stopped downloading rather than the gui just freezing up. So I chased that until I noted an issue raised in the qbittorrent github about the program replacing non-ascii character with a dot. The solution was to set the locale to en_US.UTF-8. And all of a sudden qbittorrent freezes were gone. So I am back to monitoring to see whether the programs remains stable over time. I also wonder whether the transmission-daemon's issue was also with non-ascii characters. And all the later troubleshooting with the 5.19.1 kernel and no changes to zswap, nor increased swap size, nor turning off mglru
Bottom line is that the issue may not lie with the image or kernels, but with the programs I am using. Will keep you informed
Wanted to circle back and touch base. I have been doing a whole lot of testing on both the latest image and your previous Ubuntu 20.04 image. I have semi-concluded that a big part of the issue is probably the two torrent clients that I use - qbittorrent and transmission. Although they are stable applications, they still have ongoing issues. In the end, I have done two things which seemed to have stabilised the situation (or at least no overnight crash)
- I used a static binary (portable) version of qbittorrent-nox (headless) called qbittorrent-nox-static which has qt5 and earlier version of libtorrent and that helped. I also compiled transmission nightly and used that
- I have also set up a systemd service file to start the two torrent clients and used OOMScoreAdjust=-800 under the [service] heading to prevent OOM from killing the service. Probably not a good workaround, but it seem to remain stable
I also can't help thinking that MGLRU may not have helped because it is still so new in the community and there is no exposure to this from the torrent client's perspective. It will be interesting to see whether Linux kernel 6.1 with full MGLRU implementation will cause wider issues in the community
@sirzur - thanks a lot for the feedback
regarding mglru i think it should be one of the most stable patches finding their way into the kernel as it is in use for years in millions of android devices and chromebooks and in other environments - the info in the readme is quite impressive: https://github.com/hexdump0815/kernel-extra-patches/blob/main/multi-gen-lru/v13/readme.txt
Impressive that there are currently so many real world implementation for MGLRU. By the way, I did try to use the v14 patches from this git, but the patches complained about a couple of missing files. It still patched the rest and I will try using the 5.19.1 kernel with the v14 MGLRU patches soon
@sirzur - v14 is intended for v6.0, so it will not patch cleanly against v5.19 ... there are no real major changes compared to v13 besides updating it for the newer kernel i think, so v13 should be fine for now
@sirzur - maybe this is interesting for you: https://github.com/hexdump0815/imagebuilder/commit/f4aaabf3dbdafce9e7f7d6c723737014b39c6605 - i.e. in case you run into oom kills, it might be worth to disable mglru and/or zswap
LOL. Plenty of people out there with OOM issues on Ubuntu 22.04LTS, starting in June. I am glad that I am not the only one https://techhq.com/2022/07/ubuntu-22-oomd-app-killer-memory-pressure/
Some people have resorted to killing the OOM service https://askubuntu.com/questions/1404888/how-do-i-disable-the-systemd-oom-process-killer-in-ubuntu-22-04
And there seems to be a fix which has been released https://www.cosfone.com/ubuntu-22-04-has-fixed-the-problem-of-oomd-killing-apps/
Just recently I rebooted my Odroid U3, which has been running fine for a while. As I mentioned earlier, I used a systemd service file with OOMScoreAdjust=-800 under the [service] heading to prevent OOM from killing the service (qbittorrent and transmission-daemon). Then the OS started OOM killing my samba connection and also desktop. After googling the problem again and reading the links above, I ran the software updater and saw the update includes the fix above. So I updated and have not had any problem since. I think I just set swap file to 2GB and have not disabled MGLRU or fiddled with zswap. It has been running stable for about a week now.
@sirzur - after some hints and debugging with the mglru author we found what the cause of the oom kills was (those were oom kills from the kernel and not via ubuntu systemd oom stuff): the zswap pool paramater needs to be changed from z3fold to zsmalloc as zbud and z3fold can only allocate from normal memory and not from himem and normal mem on the odroid u3 is only about 700mb - due to this the system ran out of memory in this region and the oom killer kicked in - zsmalloc can allocate from both normal and himem and as a nice side effect it allows higher compression ratios than 1:3 even :) ... so in case you want to go extra safe, maybe change z3fold to zsmalloc in your rc.local ... i will update the imagebuilder config soon and future updates will have this adjusted out of the box then ...
best wishes - hexdump
Thanks for the update and the debugging. In the back of my mind, I have always wondered about the oom, but did not think through it enough to differentiate between kernel oom and the actual OS oom
So in the rc.local, only one line needs to be changed echo z3fold > /sys/module/zswap/parameters/zpool
is changed to echo zsmalloc > /sys/module/zswap/parameters/zpool
Is there any other file / location where zswap is enabled. Or is everything done from the rc.local file?
@sirzur - in those images here its only set in rc.local