linux-nova
linux-nova copied to clipboard
OOM with FxMark MWUL
Reproducible both on a raw machine and in a QEMU VM. MWCM and MWCL work on the raw machine, but MWUL crashes with the same setup. Not sure if it's related to issue #1 , since this is reproducible on a machine with large DRAM, and only happens on the unlink workload.
Raw machine config: CPU: AMD Ryzen Threadripper 2990WX 32-Core Processor; DRAM: 8*16G DDR4; DRAM emulated persistent memory: 10G, mounted with NOVA file system; OS: Ubuntu 18.04.3 LTS, with linux-nova 5.1.0+ kernel.
QEMU VM config: KVM enabled; CPU: 4 virtual CPU cores; DRAM: 1G; DRAM emulated persistent memory: 512M, mounted with NOVA file system; OS: Ubuntu 18.04.3 LTS, with linux-nova 5.1.0+ kernel.
In FxMark main function:
run_config = [
(Runner.CORE_FINE_GRAIN,
PerfMon.LEVEL_LOW,
("nvme", "*", "MWUL", "*", "directio")),
]
(I've ported NOVA mount for FxMark.)
I can't capture kernel error message on the raw machine console.
Kernel log from QEMU:
[ 1082.306675] fxmark invoked oom-killer: gfp_mask=0x100cca(GFP_HIGHUSER_MOVABLE), order=0, oom_score_adj=0
[ 1082.326329] CPU: 0 PID: 12304 Comm: fxmark Tainted: G W 5.1.0+ #1
[ 1082.330390] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.org 04/01/2014
[ 1082.336790] Call Trace:
[ 1082.338281] dump_stack+0x85/0xcb
[ 1082.347124] dump_header+0x57/0x550
[ 1082.348669] ? _raw_spin_unlock_irqrestore+0x32/0x60
[ 1082.351656] oom_kill_process+0xb5/0x290
[ 1082.355074] out_of_memory+0xf3/0x680
[ 1082.358184] __alloc_pages_slowpath+0xc12/0xf70
[ 1082.362263] ? find_held_lock+0x34/0xa0
[ 1082.364283] __alloc_pages_nodemask+0x31c/0x390
[ 1082.366430] pagecache_get_page+0xa5/0x2e0
[ 1082.368380] filemap_fault+0x32c/0x8d0
[ 1082.370150] ? ext4_filemap_fault+0x27/0x3e
[ 1082.372003] ext4_filemap_fault+0x2f/0x3e
[ 1082.373753] __do_fault+0x53/0x129
[ 1082.375279] __handle_mm_fault+0xd0e/0x1110
[ 1082.377123] __do_page_fault+0x34a/0x5b0
[ 1082.378852] ? async_page_fault+0x8/0x30
[ 1082.380760] async_page_fault+0x1e/0x30
[ 1082.382589] RIP: 0033:0x7f44805d3c8e
[ 1082.384410] Code: Bad RIP value.
[ 1082.385992] RSP: 002b:00007ffc5aa53a60 EFLAGS: 00010246
[ 1082.388492] RAX: 0000000000000003 RBX: 00007f4480ad2040 RCX: 00007f44805d3c8e
[ 1082.391805] RDX: 0000000000000042 RSI: 00007ffc5aa53ae0 RDI: 00000000ffffff9c
[ 1082.395144] RBP: 00007ffc5aa53ae0 R08: 0000000000000000 R09: 0000000000000000
[ 1082.398266] R10: 00000000000001c0 R11: 0000000000000246 R12: 000055ee52638c6f
[ 1082.401381] R13: 0000000000000000 R14: 00007f4480ac9000 R15: 000055ee5283b680
[ 1082.412623] Mem-Info:
[ 1082.415381] active_anon:1 inactive_anon:0 isolated_anon:0
[ 1082.415381] active_file:0 inactive_file:19 isolated_file:0
[ 1082.415381] unevictable:0 dirty:0 writeback:0 unstable:0
[ 1082.415381] slab_reclaimable:77402 slab_unreclaimable:13589
[ 1082.415381] mapped:32 shmem:0 pagetables:1169 bounce:0
[ 1082.415381] free:1960 free_pcp:364 free_cma:0
[ 1082.448728] Node 0 active_anon:4kB inactive_anon:80kB active_file:36kB inactive_file:68kB unevictable:0kB isolated(anon):0kB isolated(file):0kB mapped:308kB dirty:0kB writeback:0kB shmem:0kB shmem_thp: 0kB shmem_pmdmapped: 0kB anon_thp: 0kB writeback_tmp:0kB unstable:0kB all_unreclaimable? no
[ 1082.485340] Node 0 DMA free:3192kB min:100kB low:124kB high:148kB active_anon:0kB inactive_anon:4kB active_file:0kB inactive_file:0kB unevictable:0kB writepending:0kB present:15992kB managed:15908kB mlocked:0kB kernel_stack:0kB pagetables:48kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB
[ 1082.497058] lowmem_reserve[]: 0 367 367 367 367
[ 1082.499104] Node 0 DMA32 free:15684kB min:7800kB low:8400kB high:9000kB active_anon:4kB inactive_anon:84kB active_file:268kB inactive_file:2820kB unevictable:0kB writepending:0kB present:507760kB managed:383712kB mlocked:0kB kernel_stack:2176kB pagetables:4628kB bounce:0kB free_pcp:2524kB local_pcp:648kB free_cma:0kB
[ 1082.514127] lowmem_reserve[]: 0 0 0 0 0
[ 1082.515838] Node 0 DMA: 148*4kB (EH) 30*8kB (EH) 26*16kB (EH) 26*32kB (EH) 15*64kB (EH) 3*128kB (EH) 1*256kB (H) 0*512kB 0*1024kB 0*2048kB 0*4096kB = 3680kB
[ 1082.525186] Node 0 DMA32: 0*4kB 313*8kB (UE) 226*16kB (UE) 301*32kB (UE) 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 15752kB
[ 1082.543391] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB
[ 1082.549303] 2435 total pagecache pages
[ 1082.550993] 61 pages in swap cache
[ 1082.552498] Swap cache stats: add 18883, delete 18822, find 813/1657
[ 1082.555320] Free swap = 1825760kB
[ 1082.556820] Total swap = 1885180kB
[ 1082.558336] 130938 pages RAM
[ 1082.561099] 0 pages HighMem/MovableOnly
[ 1082.564939] 31033 pages reserved
[ 1082.567954] 0 pages cma reserved
[ 1082.570760] 0 pages hwpoisoned
[ 1082.573592] Unreclaimable slab info:
[ 1082.577436] Name Used Total
[ 1082.582173] fib6_nodes 5KB 8KB
[ 1082.586625] ip6_dst_cache 6KB 15KB
[ 1082.591069] RAWv6 18KB 32KB
[ 1082.595439] UDPv6 0KB 31KB
[ 1082.599784] TCPv6 3KB 30KB
[ 1082.604048] scsi_sense_cache 2KB 8KB
[ 1082.606663] sd_ext_cdb 0KB 7KB
[ 1082.608950] sgpool-128 8KB 31KB
[ 1082.611266] sgpool-64 4KB 31KB
[ 1082.613564] sgpool-32 2KB 31KB
[ 1082.615895] sgpool-16 1KB 15KB
[ 1082.618206] sgpool-8 1KB 15KB
[ 1082.620497] mqueue_inode_cache 1KB 30KB
[ 1082.622855] fuse_request 0KB 15KB
[ 1082.625161] jbd2_inode 3KB 31KB
[ 1082.627494] ext4_system_zone 5KB 7KB
[ 1082.629770] bio-1 2KB 15KB
[ 1082.632107] posix_timers_cache 0KB 15KB
[ 1082.634815] UNIX 247KB 270KB
[ 1082.637114] tcp_bind_bucket 1KB 8KB
[ 1082.639447] ip_fib_trie 3KB 7KB
[ 1082.641953] ip_fib_alias 3KB 7KB
[ 1082.644282] ip_dst_cache 8KB 15KB
[ 1082.646603] RAW 14KB 30KB
[ 1082.648916] UDP 6KB 61KB
[ 1082.651257] tw_sock_TCP 0KB 15KB
[ 1082.653524] request_sock_TCP 0KB 15KB
[ 1082.655824] TCP 13KB 29KB
[ 1082.658098] hugetlbfs_inode_cache 2KB 30KB
[ 1082.660550] eventpoll_pwq 58KB 63KB
[ 1082.662848] eventpoll_epi 82KB 94KB
[ 1082.665150] inotify_inode_mark 56KB 63KB
[ 1082.667527] request_queue 100KB 114KB
[ 1082.669820] blkdev_ioc 35KB 47KB
[ 1082.672141] bio-0 36KB 78KB
[ 1082.674441] biovec-max 284KB 403KB
[ 1082.676749] biovec-128 0KB 31KB
[ 1082.679045] biovec-64 0KB 94KB
[ 1082.681300] biovec-16 0KB 47KB
[ 1082.683628] bio_integrity_payload 1KB 15KB
[ 1082.686068] uid_cache 4KB 15KB
[ 1082.688347] dmaengine-unmap-256 2KB 31KB
[ 1082.690706] dmaengine-unmap-128 1KB 31KB
[ 1082.693047] dmaengine-unmap-16 0KB 15KB
[ 1082.695395] dmaengine-unmap-2 0KB 7KB
[ 1082.697667] audit_buffer 0KB 7KB
[ 1082.699992] skbuff_fclone_cache 0KB 15KB
[ 1082.702371] skbuff_head_cache 0KB 62KB
[ 1082.704670] file_lock_cache 3KB 46KB
[ 1082.706956] file_lock_ctx 16KB 30KB
[ 1082.709231] fsnotify_mark_connector 47KB 54KB
[ 1082.711788] shmem_inode_cache 1755KB 1759KB
[ 1082.714093] task_delay_info 65KB 76KB
[ 1082.716382] taskstats 3KB 35KB
[ 1082.718671] proc_dir_entry 204KB 218KB
[ 1082.720940] pde_opener 1KB 15KB
[ 1082.723263] seq_file 2KB 46KB
[ 1082.725526] sigqueue 0KB 7KB
[ 1082.727851] kernfs_iattrs_cache 46KB 47KB
[ 1082.730219] kernfs_node_cache 11447KB 11472KB
[ 1082.732508] mnt_cache 252KB 267KB
[ 1082.734945] filp 8265KB 8277KB
[ 1082.737236] names_cache 8KB 128KB
[ 1082.739565] lsm_file_cache 421KB 472KB
[ 1082.741846] key_jar 42KB 63KB
[ 1082.744161] nsproxy 2KB 7KB
[ 1082.746459] vm_area_struct 2067KB 2135KB
[ 1082.749053] mm_struct 76KB 123KB
[ 1082.751434] fs_cache 22KB 31KB
[ 1082.753743] files_cache 50KB 93KB
[ 1082.756066] signal_cache 195KB 215KB
[ 1082.758359] sighand_cache 276KB 307KB
[ 1082.760636] task_struct 926KB 1008KB
[ 1082.762934] cred_jar 104KB 189KB
[ 1082.765231] anon_vma_chain 967KB 1015KB
[ 1082.767566] anon_vma 758KB 816KB
[ 1082.769850] pid 66KB 80KB
[ 1082.772172] Acpi-Operand 190KB 199KB
[ 1082.774495] Acpi-ParseExt 0KB 7KB
[ 1082.776794] Acpi-Parse 0KB 7KB
[ 1082.779112] Acpi-State 0KB 30KB
[ 1082.781380] Acpi-Namespace 145KB 154KB
[ 1082.783688] numa_policy 1KB 7KB
[ 1082.786246] trace_event_file 571KB 574KB
[ 1082.788527] ftrace_event_field 1278KB 1283KB
[ 1082.790846] pool_workqueue 22KB 31KB
[ 1082.793161] task_group 104KB 123KB
[ 1082.796761] debug_objects_cache 423KB 5507KB
[ 1082.806313] page->ptl 512KB 581KB
[ 1082.812693] dma-kmalloc-512 0KB 15KB
[ 1082.818809] kmalloc-8k 741KB 849KB
[ 1082.823833] kmalloc-4k 718KB 787KB
[ 1082.826081] kmalloc-2k 2223KB 2269KB
[ 1082.828495] kmalloc-1k 916KB 956KB
[ 1082.830782] kmalloc-512 1500KB 1557KB
[ 1082.833084] kmalloc-256 272KB 280KB
[ 1082.835337] kmalloc-192 299KB 303KB
[ 1082.837543] kmalloc-128 273KB 292KB
[ 1082.839819] kmalloc-96 419KB 432KB
[ 1082.842046] kmalloc-64 1439KB 1468KB
[ 1082.844292] kmalloc-32 1138KB 1154KB
[ 1082.846498] kmalloc-16 917KB 925KB
[ 1082.848726] kmalloc-8 859KB 911KB
[ 1082.850946] kmem_cache_node 120KB 126KB
[ 1082.853143] kmem_cache 147KB 158KB
[ 1082.855376] Tasks state (memory values in pages):
[ 1082.857339] [ pid ] uid tgid total_vm rss pgtables_bytes swapents oom_score_adj name
[ 1082.861076] [ 222] 0 222 23711 1637 192512 152 0 systemd-journal
[ 1082.865122] [ 229] 0 229 26475 0 98304 69 0 lvmetad
[ 1082.868777] [ 238] 0 238 10750 1 110592 360 -1000 systemd-udevd
[ 1082.872627] [ 374] 62583 374 35481 0 184320 155 0 systemd-timesyn
[ 1082.876730] [ 454] 100 454 20010 0 184320 182 0 systemd-network
[ 1082.881330] [ 457] 101 457 17656 0 184320 170 0 systemd-resolve
[ 1082.885300] [ 477] 0 477 7082 0 102400 53 0 atd
[ 1082.889424] [ 479] 0 479 42273 1 225280 1941 0 networkd-dispat
[ 1082.896447] [ 480] 103 480 12510 0 147456 170 -900 dbus-daemon
[ 1082.907430] [ 482] 0 482 23884 0 86016 79 0 lxcfs
[ 1082.914342] [ 485] 0 485 17643 0 176128 186 0 systemd-logind
[ 1082.923099] [ 505] 0 505 7506 0 98304 75 0 cron
[ 1082.930180] [ 506] 102 506 65758 100 163840 239 0 rsyslogd
[ 1082.939265] [ 507] 0 507 71623 24 188416 255 0 accounts-daemon
[ 1082.948271] [ 511] 0 511 27619 0 114688 84 0 irqbalance
[ 1082.958192] [ 514] 0 514 289573 0 299008 3065 -900 snapd
[ 1082.966180] [ 531] 0 531 18073 0 184320 189 -1000 sshd
[ 1082.974004] [ 547] 0 547 46485 0 262144 1999 0 unattended-upgr
[ 1082.982779] [ 560] 0 560 72219 0 204800 218 0 polkitd
[ 1082.990520] [ 565] 0 565 3665 0 73728 37 0 agetty
[ 1082.999367] [ 577] 0 577 3721 0 69632 37 0 agetty
[ 1083.008086] [ 916] 0 916 3665 0 73728 36 0 getty
[ 1083.016685] [ 1148] 0 1148 27534 1 262144 265 0 sshd
[ 1083.023280] [ 1157] 1000 1157 19160 0 196608 279 0 systemd
[ 1083.031170] [ 1158] 1000 1158 27995 0 249856 665 0 (sd-pam)
[ 1083.039290] [ 1303] 1000 1303 27534 0 253952 265 0 sshd
[ 1083.046043] [ 1304] 1000 1304 5395 1 86016 452 0 bash
[ 1083.052429] [ 11976] 0 11976 27534 1 258048 264 0 sshd
[ 1083.059018] [ 12061] 1000 12061 27534 0 249856 264 0 sshd
[ 1083.065966] [ 12062] 1000 12062 5364 1 86016 438 0 bash
[ 1083.074148] [ 12207] 1000 12207 12333 1 135168 1860 0 python3
[ 1083.083793] [ 12303] 1000 12303 1156 0 53248 23 0 sh
[ 1083.092075] [ 12304] 1000 12304 1114 0 53248 20 0 fxmark
[ 1083.100999] [ 12305] 1000 12305 1114 12 53248 11 0 fxmark
[ 1083.110507] [ 12306] 1000 12306 1114 0 53248 20 0 fxmark
[ 1083.120238] [ 12308] 1000 12308 1114 0 53248 30 0 fxmark
....
The issue is FxMark MWUL workload will first create many small files to fill in the whole pmem space, then remove(unlink) them. NOVA is a hybrid filesystem and has DRAM requirement for each file. Creating too many small files may result in OOM. That is an issue that we want to fix but I don't have time now... A workaround would be limiting the number of small files in MWUL workload.
Thanks, Andiry
On Sun, Oct 20, 2019 at 6:06 PM Yige Hu [email protected] wrote:
Reproducible both on a raw machine and in a QEMU VM. MWCM and MWCL work on the raw machine, but MWUL crashes with the same setup. Not sure if it's related to issue #1 https://github.com/NVSL/linux-nova/issues/1 since this only happens on the unlink workload.
Raw machine config: CPU: AMD Ryzen Threadripper 2990WX 32-Core Processor; DRAM: 8*16G DDR4; DRAM emulated persistent memory: 10G, mounted with NOVA file system; OS: Ubuntu 18.04.3 LTS, with linux-nova 5.1.0+ kernel.
QEMU VM config: KVM enabled; CPU: 4 virtual CPU cores; DRAM: 1G; DRAM emulated persistent memory: 512M, mounted with NOVA file system; OS: Ubuntu 18.04.3 LTS, with linux-nova 5.1.0+ kernel.
In FxMark main function:
run_config = [ (Runner.CORE_FINE_GRAIN, PerfMon.LEVEL_LOW, ("nvme", "", "MWUL", "", "directio")), ]
(I've ported NOVA mount for FxMark.)
I can't capture kernel error message on the raw machine console.
Kernel log from QEMU:
[ 1082.306675] fxmark invoked oom-killer: gfp_mask=0x100cca(GFP_HIGHUSER_MOVABLE), order=0, oom_score_adj=0 [ 1082.326329] CPU: 0 PID: 12304 Comm: fxmark Tainted: G W 5.1.0+ #1 [ 1082.330390] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.org 04/01/2014 [ 1082.336790] Call Trace: [ 1082.338281] dump_stack+0x85/0xcb [ 1082.347124] dump_header+0x57/0x550 [ 1082.348669] ? _raw_spin_unlock_irqrestore+0x32/0x60 [ 1082.351656] oom_kill_process+0xb5/0x290 [ 1082.355074] out_of_memory+0xf3/0x680 [ 1082.358184] __alloc_pages_slowpath+0xc12/0xf70 [ 1082.362263] ? find_held_lock+0x34/0xa0 [ 1082.364283] __alloc_pages_nodemask+0x31c/0x390 [ 1082.366430] pagecache_get_page+0xa5/0x2e0 [ 1082.368380] filemap_fault+0x32c/0x8d0 [ 1082.370150] ? ext4_filemap_fault+0x27/0x3e [ 1082.372003] ext4_filemap_fault+0x2f/0x3e [ 1082.373753] __do_fault+0x53/0x129 [ 1082.375279] __handle_mm_fault+0xd0e/0x1110 [ 1082.377123] __do_page_fault+0x34a/0x5b0 [ 1082.378852] ? async_page_fault+0x8/0x30 [ 1082.380760] async_page_fault+0x1e/0x30 [ 1082.382589] RIP: 0033:0x7f44805d3c8e [ 1082.384410] Code: Bad RIP value. [ 1082.385992] RSP: 002b:00007ffc5aa53a60 EFLAGS: 00010246 [ 1082.388492] RAX: 0000000000000003 RBX: 00007f4480ad2040 RCX: 00007f44805d3c8e [ 1082.391805] RDX: 0000000000000042 RSI: 00007ffc5aa53ae0 RDI: 00000000ffffff9c [ 1082.395144] RBP: 00007ffc5aa53ae0 R08: 0000000000000000 R09: 0000000000000000 [ 1082.398266] R10: 00000000000001c0 R11: 0000000000000246 R12: 000055ee52638c6f [ 1082.401381] R13: 0000000000000000 R14: 00007f4480ac9000 R15: 000055ee5283b680 [ 1082.412623] Mem-Info: [ 1082.415381] active_anon:1 inactive_anon:0 isolated_anon:0 [ 1082.415381] active_file:0 inactive_file:19 isolated_file:0 [ 1082.415381] unevictable:0 dirty:0 writeback:0 unstable:0 [ 1082.415381] slab_reclaimable:77402 slab_unreclaimable:13589 [ 1082.415381] mapped:32 shmem:0 pagetables:1169 bounce:0 [ 1082.415381] free:1960 free_pcp:364 free_cma:0 [ 1082.448728] Node 0 active_anon:4kB inactive_anon:80kB active_file:36kB inactive_file:68kB unevictable:0kB isolated(anon):0kB isolated(file):0kB mapped:308kB dirty:0kB writeback:0kB shmem:0kB shmem_thp: 0kB shmem_pmdmapped: 0kB anon_thp: 0kB writeback_tmp:0kB unstable:0kB all_unreclaimable? no [ 1082.485340] Node 0 DMA free:3192kB min:100kB low:124kB high:148kB active_anon:0kB inactive_anon:4kB active_file:0kB inactive_file:0kB unevictable:0kB writepending:0kB present:15992kB managed:15908kB mlocked:0kB kernel_stack:0kB pagetables:48kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB [ 1082.497058] lowmem_reserve[]: 0 367 367 367 367 [ 1082.499104] Node 0 DMA32 free:15684kB min:7800kB low:8400kB high:9000kB active_anon:4kB inactive_anon:84kB active_file:268kB inactive_file:2820kB unevictable:0kB writepending:0kB present:507760kB managed:383712kB mlocked:0kB kernel_stack:2176kB pagetables:4628kB bounce:0kB free_pcp:2524kB local_pcp:648kB free_cma:0kB [ 1082.514127] lowmem_reserve[]: 0 0 0 0 0 [ 1082.515838] Node 0 DMA: 1484kB (EH) 308kB (EH) 2616kB (EH) 2632kB (EH) 1564kB (EH) 3128kB (EH) 1256kB (H) 0512kB 01024kB 02048kB 04096kB = 3680kB [ 1082.525186] Node 0 DMA32: 04kB 3138kB (UE) 22616kB (UE) 30132kB (UE) 064kB 0128kB 0256kB 0512kB 01024kB 02048kB 04096kB = 15752kB [ 1082.543391] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB [ 1082.549303] 2435 total pagecache pages [ 1082.550993] 61 pages in swap cache [ 1082.552498] Swap cache stats: add 18883, delete 18822, find 813/1657 [ 1082.555320] Free swap = 1825760kB [ 1082.556820] Total swap = 1885180kB [ 1082.558336] 130938 pages RAM [ 1082.561099] 0 pages HighMem/MovableOnly [ 1082.564939] 31033 pages reserved [ 1082.567954] 0 pages cma reserved [ 1082.570760] 0 pages hwpoisoned [ 1082.573592] Unreclaimable slab info: ....
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/NVSL/linux-nova/issues/77?email_source=notifications&email_token=AAKBYEG52NAYDH2NX5UILETQPT6AJA5CNFSM4JCW525KYY3PNVWWK3TUL52HS4DFUVEXG43VMWVGG33NNVSW45C7NFSM4HTBEBPQ, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAKBYEAOSYDKB3MH42O7RZ3QPT6AJANCNFSM4JCW525A .