simplefs Fix sb_read and umount cache not be freed

Using kmemleak to detect the leak, I found the the command "ls" and "mv" will make the cache leak

here are the leak informations:

0 [<ffffffffa5d2f98e>] __alloc_pages+0x24e
1 [<ffffffffa5d2f98e>] __alloc_pages+0x24e
2 [<ffffffffa5d51d3e>] alloc_pages+0x9e
3 [<ffffffffa5cbd4de>] __page_cache_alloc+0x7e
4 [<ffffffffa5cc1342>] pagecache_get_page+0x152
5 [<ffffffffa5dedec8>] grow_dev_page+0x48
6 [<ffffffffa5deebac>] __getblk_gfp+0xbc
7 [<ffffffffa5deed01>] __bread_gfp+0x11
  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ <-- seems __bread() leak.
8 [<ffffffffc098427b>] ftrace_trampoline+0x427b
9 [<ffffffffc09845a7>] ftrace_trampoline+0x45a7
10 [<ffffffffa5dafaed>] vfs_mkdir+0xad
11 [<ffffffffa5db3368>] do_mkdirat+0x128
12 [<ffffffffa5db351c>] __x64_sys_mkdir+0x4c
13 [<ffffffffa5a04d64>] x64_sys_call+0x94
14 [<ffffffffa67c2ce6>] do_syscall_64+0x56
15 [<ffffffffa68000df>] entry_SYSCALL_64_after_hwframe+0x67

May 19 '24 13:05 RoyWFHuang

I defer to @HotMercury for confirmation.

May 19 '24 14:05 jserv

Hi @RoyWFHuang There is something wrong on my environment to use kmemleak, I will check later.

May 21 '24 09:05 HotMercury

Hi @RoyWFHuang There is something wrong on my environment to use kmemleak, I will check later.

@HotMercury error: attempt to use poisoned "strlcpy" Is that the problem you encounter?

If that, you can go to bpftool/libbpf/ and update it to latest version

May 21 '24 23:05 RoyWFHuang

Thank you for your suggestions. I believe your logic in checking is correct, but I am using kmemleak, and it does not show any leak errors. Please give me more about testing environment like kernel version and below information.

  BPFTOOL  bpftool/bootstrap/bpftool
...                        libbfd: [ on  ]
...               clang-bpf-co-re: [ on  ]
...                          llvm: [ on  ]
...                        libcap: [ on  ]

In my test, there is no error

$ sudo ./kmodleak simplefs -v
module 'simplefs' loaded
module 'simplefs' unloaded

3 stacks with outstanding allocations:
8192 bytes in 2 allocations from stack
	addr = 0xa33 size = 4096
	addr = 0xc2b size = 4096
	0 [<ffffffff9284e7e4>] __alloc_pages+0x264
	1 [<ffffffff9284e7e4>] __alloc_pages+0x264
	2 [<ffffffff928859b1>] alloc_pages_mpol+0x91
	3 [<ffffffff92885f14>] folio_alloc+0x64
	4 [<ffffffff927b5734>] filemap_alloc_folio+0xf4
	5 [<ffffffff927bac5b>] __filemap_get_folio+0x14b
	6 [<ffffffff929411f2>] __getblk_slow+0xb2
	7 [<ffffffff92941591>] __bread_gfp+0x81
	8 [<ffffffffc0710b59>] init_iso9660_fs+0x6b49
	9 [<ffffffff928ff674>] iterate_dir+0x114
	10 [<ffffffff928fff94>] __x64_sys_getdents64+0x84
	11 [<ffffffff92405b73>] x64_sys_call+0x1b43
	12 [<ffffffff9361b78f>] do_syscall_64+0x7f
	13 [<ffffffff9380012b>] entry_SYSCALL_64_after_hwframe+0x73
208 bytes in 2 allocations from stack
	addr = 0xffff898c35e6c8f0 size = 104
	addr = 0xffff898c1d3add00 size = 104
	0 [<ffffffff9285c8a3>] kmem_cache_alloc+0x253
	1 [<ffffffff9285c8a3>] kmem_cache_alloc+0x253
	2 [<ffffffff9293d6be>] alloc_buffer_head+0x1e
	3 [<ffffffff9293ebb5>] folio_alloc_buffers+0x95
	4 [<ffffffff92941238>] __getblk_slow+0xf8
	5 [<ffffffff92941591>] __bread_gfp+0x81
	6 [<ffffffffc0710b59>] init_iso9660_fs+0x6b49
	7 [<ffffffff928ff674>] iterate_dir+0x114
	8 [<ffffffff928fff94>] __x64_sys_getdents64+0x84
	9 [<ffffffff92405b73>] x64_sys_call+0x1b43
	10 [<ffffffff9361b78f>] do_syscall_64+0x7f
	11 [<ffffffff9380012b>] entry_SYSCALL_64_after_hwframe+0x73
192 bytes in 1 allocations from stack
	addr = 0xffff898c015ef180 size = 192
	0 [<ffffffff9285c55b>] kmem_cache_alloc_lru+0x25b
	1 [<ffffffff9285c55b>] kmem_cache_alloc_lru+0x25b
	2 [<ffffffff929046e4>] __d_alloc+0x34
	3 [<ffffffff9290493a>] d_alloc+0x1a
	4 [<ffffffff92907c7a>] d_alloc_parallel+0x5a
	5 [<ffffffff928f2b0c>] __lookup_slow+0x5c
	6 [<ffffffff928f2e0c>] lookup_one_unlocked+0x9c
	7 [<ffffffff928f2ebd>] lookup_positive_unlocked+0x1d
	8 [<ffffffff92aa673d>] debugfs_lookup+0x5d
	9 [<ffffffff92aa679f>] debugfs_lookup_and_remove+0xf
	10 [<ffffffff9285fd69>] debugfs_slab_release+0x19
	11 [<ffffffff927fee5c>] kmem_cache_destroy+0x11c
	12 [<ffffffffc070c625>] init_iso9660_fs+0x2615
	13 [<ffffffffc0710fe9>] init_iso9660_fs+0x6fd9
	14 [<ffffffff925f6943>] __do_sys_delete_module.isra.0+0x1a3
	15 [<ffffffff925f6af2>] __x64_sys_delete_module+0x12
	16 [<ffffffff92406320>] x64_sys_call+0x22f0
	17 [<ffffffff9361b78f>] do_syscall_64+0x7f
	18 [<ffffffff9380012b>] entry_SYSCALL_64_after_hwframe+0x73
done

May 23 '24 04:05 HotMercury

You can test the example in kmodleak, you will see that

1 stacks with outstanding allocations:
128 bytes in 1 allocations from stack
addr = 0xffff8c4204ae7300 size = 128
0 [<ffffffffa5d61b11>] kmem_cache_alloc_trace+0x261
1 [<ffffffffa5d61b11>] kmem_cache_alloc_trace+0x261
2 [<ffffffffc09ae02b>] 
bpf_prog_cc9504c91d44b81c_kmodleak__mm_page_free+0x127f
3 [<ffffffffa5a03a06>] do_one_initcall+0x46
4 [<ffffffffa5b9ec12>] do_init_module+0x52
5 [<ffffffffa5ba062b>] load_module+0xb2b
6 [<ffffffffa5a955b0>] __kretprobe_trampoline+0x0
7 [<ffffffffa5ba0a08>] __x64_sys_finit_module+0x18
8 [<ffffffffa5a06793>] x64_sys_call+0x1ac3
9 [<ffffffffa67c2ce6>] do_syscall_64+0x56
10 [<ffffffffa68000df>] entry_SYSCALL_64_after_hwframe+0x67
done

The checker will not show "error" or "leak" (like valgrind), it only shows the allocating path (that's the eBpf actually doing) So if you see that something show out, that means some leaking in the program

And as your comments, there are 3 leaking in it

7 [<ffffffff92941591>] __bread_gfp+0x81
5 [<ffffffff92941591>] __bread_gfp+0x81
11 [<ffffffff927fee5c>] kmem_cache_destroy+0x11c

Another "11 [] kmem_cache_destroy+0x11c", The problem bothers me, Could you give me you mail that we can discuss this?

May 23 '24 13:05 RoyWFHuang

@HotMercury, can you confirm the effectiveness of this proposed change?

Jun 05 '24 03:06 jserv

Hi @RoyWFHuang You can use rcu_barrier() or a kernel thread to solve the problem where simplefs_destroy_inode_cache does not release in time before unmounting simplefs.

Jun 17 '24 15:06 HotMercury

You can use rcu_barrier() or a kernel thread to solve the problem where simplefs_destroy_inode_cache does not release in time before unmounting simplefs.

You should paste the proposed changes for dedicated discussions.

Jun 17 '24 15:06 jserv

Make the git commit messages informative. In particular, mention the way how you figure out the potential leaks.

@RoyWFHuang, can you improve the git commit messages?

Jul 12 '24 02:07 jserv

Make the git commit messages informative. In particular, mention the way how you figure out the potential leaks.

@RoyWFHuang, can you improve the git commit messages?

Yes, just wait a few days, I need some time to conclude it.

Jul 13 '24 01:07 RoyWFHuang

Thank @RoyWFHuang for contributing!

Aug 04 '24 16:08 jserv