DSK busy 100%
Looked at atop and noticed disk load from 95% to 100%.
I started to analyze, it all started with the fact that I turned off all the working projects on this dedicated server and noticed that the load had dropped to 15-20%, I thought it was in the projects .. but it wasn’t there, the load returned again and began to reach 75-85%, in atop it was clear that when kworker appeared, the disk load instantly jumped.
atop screenshots:
-
https://i.stack.imgur.com/r81Wr.png
-
https://i.stack.imgur.com/lsd8f.png
-
https://i.stack.imgur.com/nQ86t.png
I look in
perf log,perf topand see:https://i.stack.imgur.com/1VOxm.png https://i.stack.imgur.com/KdXFa.png
Drives are healthy, speed result:
1073741824 bytes (1.1 GB, 1.0 GiB) copied, 0.4319 s, 2.5 GB/sTiming buffered disk reads: 3878 MB in 3.00 seconds = 1292.39 MB/secwhat can be done in the next steps to localize the problem and load disks by 95-100%?
debian 10 Debian 4.19.181-1
The problem is similar to the one described in the closed request on github. can you tell me the options for the outcome, how to fix where? https://github.com/Atoptool/atop/issues/47
This might be a kernel issue, others have reported that changing the I/O scheduler (elevator) helps:
echo "mq-deadline" | sudo tee /sys/block/nvme*/queue/scheduler
Source: https://github.com/netdata/netdata/issues/5744#issuecomment-724208749
Which scheduler have you been using? See: https://wiki.ubuntu.com/Kernel/Reference/IOSchedulers
Some other software experienced issues because certain drivers did not use unique major / minor device numbers, not sure how many partitions and devices you have and whether that plays any role here. See: https://github.com/netdata/netdata/issues/10841
Это может быть проблема с доходами, другие сообщают, что изменение планировщика ввода-вывода (лифта) помогает:
эхо "mq-срок" | sudo tee /sys/block/nvme*/queue/scheduler
Источник: netdata/netdata#5744 (комментарий)
Какой планировщик вы использовали? См.: https://wiki.ubuntu.com/Kernel/Reference/IOSchedulers.
Некоторое другое программное обеспечение программного обеспечения проповедо проблемы, потому что некоторые драйверы не использовали основные / второстепенные номера устройств, не уникальное количество у вас разделов и играет ли это здесь какую-либо роль. См.: netdata/netdata#10841
cat /sys/block/nvme0n1/queue/scheduler
[none] mq-deadline
cat /sys/block/nvme1n1/queue/scheduler
[none] mq-deadline
I can change, but do not realize the consequences and changes awaiting me? how safe is it to do on a production server?
also, I don't quite understand how to check for major and minor disk number errors. if nvme0n1 is listed higher than nvme1n1 is this a problem?

I can change, but do not realize the consequences and changes awaiting me? how safe is it to do on a production server?
I am no expert on this but never had problems changing it on production. It is designed to be safe to change without rebooting / unmounting, so it should (only) affect performance, not cause any data corruption, as far as I know. See also: https://www.kernel.org/doc/html/latest/block/switching-sched.html
Of course I don't know what the impact is on your server / service if the performance were to degrade.
also, I don't quite understand how to check for major and minor disk number errors.
Check if the output of: lsblk shows unique major:minor numbers
if nvme0n1 is listed higher than nvme1n1 is this a problem?
Not that I know.
Thank you
I will try and let you know
I can change, but do not realize the consequences and changes awaiting me? how safe is it to do on a production server?
I am no expert on this but never had problems changing it on production. It is designed to be safe to change without rebooting / unmounting, so it should (only) affect performance, not cause any data corruption, as far as I know. See also: https://www.kernel.org/doc/html/latest/block/switching-sched.html
Of course I don't know what the impact is on your server / service if the performance were to degrade.
also, I don't quite understand how to check for major and minor disk number errors.
Check if the output of:
lsblkshows unique major:minor numbersif nvme0n1 is listed higher than nvme1n1 is this a problem?
Not that I know.
changing the scheduler really helped, but I didn't stop there.
FYI, this is happening to me on a proxmox VM where the underlying physical disks on the hypervisor are NVME. Changing to the mq-deadline scheduler on the VM seems to get rid of the incorrect busy display. (VM is running debian buster)