linux icon indicating copy to clipboard operation
linux copied to clipboard

Kernel 6.6 have memory leak on nfs

Open llevet opened this issue 10 months ago • 7 comments

Describe the bug

Platform : raspberry pi 5 8Gb with kernel 6.6.20+rpt-rpi-2712

I found the problem of memory leak. This is from NFS. My raspberry is used as NFS server with this parameters :

nfsd.conf : [nfsd] threads=16 udp=yes tcp=yes vers2=no vers3=yes vers4=yes vers4.0=yes vers4.1=yes vers4.2=yes

/etc/exports :

/export/cluster-data 172.31.31.160/32(fsid=663f02fb-a2eb-4c16-b809-29da7f5d24c5,rw,async,insecure,all_squash,anonuid=1000,anongid=100,no_subtree_check,rw) 172.31.31.159/32(fsid=5cd4a8cf-642b-4be1-abca-df0532aba469,rw,async,insecure,all_squash,anonuid=1000,anongid=100,no_subtree_check,rw) /export 172.31.31.160/32(ro,fsid=0,root_squash,no_subtree_check) /export 172.31.31.159/32(ro,fsid=0,root_squash,no_subtree_check)

On 2 clients (/etc/mtab): 172.31.31.248:/cluster-data /mnt/pve/cluster-data-pistorage nfs4 rw,relatime,vers=4.2,rsize=1048576,wsize=1048576,namlen=255,hard,proto=tcp,timeo=600,retrans=2,sec=sys,clientaddr=172.31.31.160,local_lock=none,addr=172.31.31.248 0 0

Steps to reproduce the behaviour

As soon as the clients are connected (Zone 1 and Zone 3 see on img), the memory begin slowly to leak , even without transfert of data on nfs. As soon as the clients are disconnected (Zone 2 in img), memory stay as the same level of usage. R in img is a reboot.

Nfsd module stay at the same memory size in lsmod. The problem is nfs in the kernel.

MEM1 MEM2

Device (s)

Raspberry Pi 5

System

cat /etc/rpi-issue Raspberry Pi reference 2024-03-15 Generated using pi-gen, https://github.com/RPi-Distro/pi-gen, f19ee211ddafcae300827f953d143de92a5c6624, stage2

vcgencmd version 2024/02/16 15:28:41 Copyright (c) 2012 Broadcom version 4c845bd3 (release) (embedded)

uname -a Linux pistorage 6.6.20+rpt-rpi-2712 #1 SMP PREEMPT Debian 1:6.6.20-1+rpt1 (2024-03-07) aarch64 GNU/Linux

Logs

No response

Additional context

No response

llevet avatar Apr 11 '24 14:04 llevet

After 10 days when the memory is comming full, the pi restarts by itself after a OOM. You can find more details starting here : https://forums.raspberrypi.com/viewtopic.php?t=361116&start=75#p2202411 This problem doesn't exist on previous official 6.1 kernel.

llevet avatar Apr 11 '24 14:04 llevet

There are no nfs related downstream commits to the kernel, so this is likely to be an upstream kernel issue. We don't have any special knowledge of kernel nfs code, so this may be tricky to track down.

Some possible approaches: It seems likely that the leak may be present other platforms, so trying to find similar reports may be useful. Reporting it to upstream kernel devs may be useful. Any nfs related commits present in 6.6 but not present in 6.1 could be reverted in test builds to try to track it down. Testing 6.2, 6.3, 6.4 and 6.5 kernels would help narrow it down further.

popcornmix avatar Apr 11 '24 16:04 popcornmix

Hum ... Seem to be detected here : https://lore.kernel.org/lkml/[email protected]/ https://bugzilla.kernel.org/show_bug.cgi?id=218671

llevet avatar Apr 11 '24 16:04 llevet

And it's been backported to 6.6.26. I've just bumped rpi-update to that, so if you run sudo rpi-update you should have the fix.

popcornmix avatar Apr 12 '24 11:04 popcornmix

Ok, I just updated my pi Linux pistorage 6.6.26-v8-16k+ #1754 SMP PREEMPT Thu Apr 11 14:51:20 BST 2024 aarch64 GNU/Linux

I will come back to you to give you the result.

llevet avatar Apr 12 '24 17:04 llevet

The memory leak is gone. Tested during 16 hours and memory used is stable. When did you think it will push on official rpi stable kernel ? Thanks a lot.

llevet avatar Apr 13 '24 09:04 llevet

When did you think it will push on official rpi stable kernel ?

I'll flag it up as an important bug fix, and let you know when.

popcornmix avatar Apr 14 '24 13:04 popcornmix

Latest apt kernel is 6.6.31-1+rpt1 (2024-05-29) so contains this fix.

popcornmix avatar Jun 26 '24 14:06 popcornmix