QAT_Engine
QAT_Engine copied to clipboard
Question about memory
(1) What's the necessary of qat_config_mem.ko ? I simply rmmod it while Nginx was still running and found Nginx worked and reloaded correctly.
(2) Next few days after I did (1), I found Nginx did't work, dmesg shows:
[1564771.408920] Nginx: page allocation failure: order:9, mode:0x3040d0
[1564771.416519] CPU: 29 PID: 13537 Comm: Nginx Tainted: G OE ------------ 3.10.0-327.el7.x86_64 #1
[1564771.428579] Hardware name: Huawei Technologies Co., Ltd. Tecal RH2288H V2-12L/BC11SRSG1, BIOS RMIBV503 03/09/2015
[1564771.439347] Call Trace:
[1564771.442148] [<ffffffff81650e6a>] dump_stack+0x19/0x1b
[1564771.447628] [<ffffffff81179ac0>] warn_alloc_failed+0x110/0x180
[1564771.453892] [<ffffffff8164b57d>] ? __alloc_pages_direct_compact+0x186/0x1b5
[1564771.461452] [<ffffffff8164bb9b>] __alloc_pages_slowpath+0x5ef/0x826
[1564771.468152] [<ffffffff8117dcba>] __alloc_pages_nodemask+0x41a/0x440
[1564771.474841] [<ffffffff8164d062>] kmalloc_large_node+0x60/0x8d
[1564771.481017] [<ffffffff811d1c32>] __kmalloc_node+0x222/0x280
[1564771.487006] [<ffffffff81176265>] ? filemap_fault+0x225/0x430
[1564771.493290] [<ffffffffa0502dfc>] dev_mem_alloc.isra.6+0x12c/0x5f0 [usdm_drv]
[1564771.501108] [<ffffffffa0503427>] mem_ioctl+0x167/0x200 [usdm_drv]
[1564771.507752] [<ffffffff81204575>] do_vfs_ioctl+0x2e5/0x4c0
[1564771.513670] [<ffffffff812047f1>] SyS_ioctl+0xa1/0xc0
[1564771.519113] [<ffffffff816630fd>] system_call_fastpath+0x16/0x1b
[1564771.525535] Mem-Info:
[1564772.301905] usdm_drv: userMemAlloc:380 Unable to allocate memory slab or wrong alignment: (null)
[1564772.312064] usdm_drv: dev_mem_alloc:566 userMemAlloc failed
Seemed memory leaked but the memory usage of Nginx was low(I event restarted the Nginx but it didn't work still).
Then I found that Cached in /proc/meminfo is very large. Yes, I do a lot of log action, and I think the system would reclaim it.
(3) I removed the log file, then the Cached in /proc/meminfo recovered, and restarted Nginx, it works now!
(4) Does it related tormmod qat_config_mem
error log
[WARNING][4964:e_qat.c:350:qat_engine_init()] icp_sal_userStart failed
After I send request to worker 4964, then the error log become as flow:
[WARNING][4964:e_qat.c:329:qat_engine_init()] pthread_key_create failed: Resource temporarily unavailable
Hi @mrpre,
Sorry I've not been available for a while. When you run ./configure for the QAT Engine are you passing it --enable-usdm? If you are (i.e. you are using the USDM Memory Driver) then you have no need for qat_config_mem and can safely rmmod it.
If you are using the Latest QAT Driver then the QAT Driver itself uses USDM for pinned contiguous memory allocations for DMAing to the QAT hardware. The QAT Engine on the other hand can either use qat_contig_mem (supplied with the QAT Engine) or also make use of the USDM Memory Driver. It is recommended to use the USDM Memory Driver as it is a higher quality code base with more features. The USDM Memory Driver works by allocating 2MB slabs and dividing them up for individual allocations. The failures you are seeing in your log are from not being able to allocate a 2MB slab of contiguous memory from the kernel. This can happen for the following reasons:
- Fragmentation - Over time the slabs available for allocation become more fragmented and eventually you run out of 2MB slabs. Do: cat /proc/buddyinfo will tell you how many slabs of each size are available on your system.
- System not releasing resources - Sometimes it looks like you have a memory leak but it is just resources that are being cached/haven't been released correctly. This can impact available slabs but from a personal perspective I've not seen a situation where the system actually runs out slabs due to it.
- A genuine memory leak, this would cause a depletion over time.
It looks like you are suffering Item 2 but unfortunately I don't have answers for what you can do about that. I don't believe it is related to your removal of qat_contig_mem. If you were using qat_contig_mem then I would have expected it to complain or give crashes straight away rather than later on.
Kind Regards,
Steve.
@stevelinsell Hello Steve, I have a similar issue. userMemAlloc:380 can allocate memory but is with the problem of wrong alignment. What's likely root cause? Is it still caused by USDM failing to allocate 2MB slab? Many thanks in advance.
Besides, if the wrong alignment is caused by USDM failing to allocate 2MB slab, can I use kvzalloc() to replace kmalloc() in QAT driver/USDM? Does QAT driver/USDM require "physically contiguous memory in the kernel's own address space"? https://lwn.net/Articles/711653/
Version information: QAT driver: QAT1.7.Upstream.L.1.0.3-42 QAT engine: QAT_Engine-0.5.40, uses qat_contig_mem as memory driver.
Intel QAT driver uses USDM to allocate memory. "The USDM Memory Driver works by allocating 2MB slabs and dividing them up for individual allocations. The failures you are seeing in your log are from not being able to allocate a 2MB slab of contiguous memory from the kernel."
The device is highly fragmented because there're few chunks equal to or greater than 2MB. cat /proc/buddyinfo Node 0, zone DMA 2 1 0 1 3 2 2 1 2 0 0 0 Node 0, zone DMA32 3721 455 163 120 41 15 7 3 2 7 5 1 Node 0, zone Normal 273 237 187 128 66 32 28 5 4 0 0 0
Nov 26 13:01:27 user.err kernel: [4999514.599784] userMemAlloc:380 Unable to allocate memory slab or wrong alignment: 000000006faebeda Nov 26 13:01:27 user.err kernel: [4999514.707096] dev_mem_alloc:566 userMemAlloc failed Nov 26 13:01:27 user.info kernel: [4999514.765474] xxxxx[15568]: segfault at 40 ip 00007f8ffa1938d0 sp 00007ffc35618518 error 4 in libjemalloc.so.2[7f8ffa13b000+76000] Nov 26 13:01:27 user.info kernel: [4999514.765478] Code: 66 2e 0f 1f 84 00 00 00 00 00 48 8b 06 48 8d 15 06 15 02 00 48 c1 e8 12 0f b6 c8 48 8b 04 ca c3 66 2e 0f 1f 84 00 00 00 00 00 <48> 8b 46 40 c3 66 66 2e 0f 1f 84 00 00 00 00 00 48 89 56 40 c3 66 Nov 26 13:01:29 user.warn kernel: [4999516.600569] warn_alloc: 4 callbacks suppressed Nov 26 13:01:29 user.warn kernel: [4999516.600570] xxxxx: page allocation failure: order:9, mode:0x40cc0(GFP_KERNEL|__GFP_COMP), nodemask=(null) Nov 26 13:01:29 user.debug kernel: [4999516.600572] CPU: 2 PID: 15572 Comm: xxxxx Kdump: loaded Tainted: G W O 5.4.0 1 Nov 26 13:01:29 user.debug kernel: [4999516.600573] Hardware name: GIGABYTE MN32-EC2-F5/MN32-EC2-F5, BIOS F03 06/04/2019 Nov 26 13:01:29 user.debug kernel: [4999516.600573] Call Trace: Nov 26 13:01:29 user.debug kernel: [4999516.600577] dump_stack+0x50/0x70 Nov 26 13:01:29 user.debug kernel: [4999516.600578] warn_alloc.cold+0x73/0xd7 Nov 26 13:01:29 user.debug kernel: [4999516.600580] __alloc_pages_slowpath+0x8e3/0xaa0 Nov 26 13:01:29 user.debug kernel: [4999516.600581] ? cdev_put.part.0+0x20/0x20 Nov 26 13:01:29 user.debug kernel: [4999516.600582] __alloc_pages_nodemask+0x222/0x250 Nov 26 13:01:29 user.debug kernel: [4999516.600584] kmalloc_large_node+0x40/0xa0 Nov 26 13:01:29 user.debug kernel: [4999516.600585] __kmalloc_node+0x12b/0x290 Nov 26 13:01:29 user.debug kernel: [4999516.600587] 0xffffffffa02848cd Nov 26 13:01:29 user.debug kernel: [4999516.600588] 0xffffffffa02849cf Nov 26 13:01:29 user.debug kernel: [4999516.600589] do_vfs_ioctl+0x3e4/0x640 Nov 26 13:01:29 user.debug kernel: [4999516.600590] ksys_ioctl+0x3a/0x70 Nov 26 13:01:29 user.debug kernel: [4999516.600592] __x64_sys_ioctl+0x16/0x20 Nov 26 13:01:29 user.debug kernel: [4999516.600593] do_syscall_64+0x68/0x3c0 Nov 26 13:01:29 user.debug kernel: [4999516.600594] ? __do_page_fault+0x23d/0x480 Nov 26 13:01:29 user.debug kernel: [4999516.600596] entry_SYSCALL_64_after_hwframe+0x44/0xa9
2.search regular expression "^.started . acceleration engines\n" in dmesg info. In the configuration file: [SHIM] NumberCyInstances = 22
But it seems that only 16 Crypto instances are obtained each time. Line 887: Sep 12 03:39:16 user.info kernel: [ 49.046552] c6xx 0000:53:00.0: qat_dev0 started 8 acceleration engines Line 893: Sep 12 03:39:16 user.info kernel: [ 50.768547] c6xx 0000:53:00.0: qat_dev0 started 8 acceleration engines Line 1818: Sep 25 04:41:17 user.info kernel: [ 48.279127] c6xx 0000:53:00.0: qat_dev0 started 8 acceleration engines Line 1824: Sep 25 04:41:17 user.info kernel: [ 50.000120] c6xx 0000:53:00.0: qat_dev0 started 8 acceleration engines Line 3075: Sep 29 17:17:18 user.info kernel: [ 48.391143] c6xx 0000:53:00.0: qat_dev0 started 8 acceleration engines Line 3081: Sep 29 17:17:18 user.info kernel: [ 50.114146] c6xx 0000:53:00.0: qat_dev0 started 8 acceleration engines