kvdo icon indicating copy to clipboard operation
kvdo copied to clipboard

High CPU load caused by indexW kernel threads while vdo volume is unused

Open nkichukov opened this issue 4 years ago • 5 comments

Hello team,

I have noticed that the indexW kernel threads consume a lot of CPU time while the VDO volume is idle (for example when it has just been started and has been unused since):

CPU time increase, as reported by 'top':

$ top -bw | grep indexW
6608 root      20   0       0      0      0 S   6.2   0.0   0:01.69 kvdo0:indexW
6609 root      20   0       0      0      0 S   6.2   0.0   0:01.69 kvdo0:indexW
6611 root      20   0       0      0      0 S   6.2   0.0   0:01.69 kvdo0:indexW
6612 root      20   0       0      0      0 S   6.2   0.0   0:01.66 kvdo0:indexW
6613 root      20   0       0      0      0 S   6.2   0.0   0:01.68 kvdo0:indexW
6610 root      20   0       0      0      0 S   0.0   0.0   0:01.69 kvdo0:indexW
6611 root      20   0       0      0      0 S   4.6   0.0   0:01.83 kvdo0:indexW
6608 root      20   0       0      0      0 S   4.3   0.0   0:01.82 kvdo0:indexW
6609 root      20   0       0      0      0 S   4.3   0.0   0:01.82 kvdo0:indexW
6610 root      20   0       0      0      0 S   4.3   0.0   0:01.82 kvdo0:indexW
6612 root      20   0       0      0      0 S   4.3   0.0   0:01.79 kvdo0:indexW
6613 root      20   0       0      0      0 S   4.3   0.0   0:01.81 kvdo0:indexW
6610 root      20   0       0      0      0 S   4.6   0.0   0:01.96 kvdo0:indexW
6613 root      20   0       0      0      0 S   4.6   0.0   0:01.95 kvdo0:indexW
6608 root      20   0       0      0      0 S   4.3   0.0   0:01.95 kvdo0:indexW
6609 root      20   0       0      0      0 S   4.3   0.0   0:01.95 kvdo0:indexW
6611 root      20   0       0      0      0 S   4.3   0.0   0:01.96 kvdo0:indexW
6612 root      20   0       0      0      0 S   4.0   0.0   0:01.91 kvdo0:indexW
6613 root      20   0       0      0      0 S   4.6   0.0   0:02.09 kvdo0:indexW
6608 root      20   0       0      0      0 S   4.3   0.0   0:02.08 kvdo0:indexW
6609 root      20   0       0      0      0 S   4.3   0.0   0:02.08 kvdo0:indexW
6610 root      20   0       0      0      0 S   4.3   0.0   0:02.09 kvdo0:indexW
6611 root      20   0       0      0      0 S   4.3   0.0   0:02.09 kvdo0:indexW
6612 root      20   0       0      0      0 S   4.3   0.0   0:02.04 kvdo0:indexW

VDO statistics after collecting the usage above:

# vdostats --verbose | grep -e 'bios in\|bios out'
  bios in read                        : 0
  bios in write                       : 0
  bios in discard                     : 0
  bios in flush                       : 0
  bios in fua                         : 0
  bios in partial read                : 0
  bios in partial write               : 0
  bios in partial discard             : 0
  bios in partial flush               : 0
  bios in partial fua                 : 0
  bios out read                       : 0
  bios out write                      : 0
  bios out discard                    : 0
  bios out flush                      : 0
  bios out fua                        : 0
  bios out completed read             : 0
  bios out completed write            : 0
  bios out completed discard          : 0
  bios out completed flush            : 0
  bios out completed fua              : 0
  bios in progress read               : 0
  bios in progress write              : 0
  bios in progress discard            : 0
  bios in progress flush              : 0
  bios in progress fua                : 0

System information: GNU/Gentoo Linux (amd64) kvdo version: 6.2.3.114 on 5.8.0 kernel (amd64) (kvdo-corp) GCC: 10.2

The VDO device is started on top of dm-crypt encrypted partition, ie:

    vdo_storage: !VDOService
      _operationState: finished
      ackThreads: 1
      activated: enabled
      bioRotationInterval: 64
      bioThreads: 4
      blockMapCacheSize: 128M
      blockMapPeriod: 16380
      compression: enabled
      cpuThreads: 2
      deduplication: enabled
      device: /dev/mapper/cryptstorage
      hashZoneThreads: 1
      indexCfreq: 0
      indexMemory: 0.25
      indexSparse: disabled
      indexThreads: 0
      logicalBlockSize: 4096
      logicalSize: 1T
      logicalThreads: 1
      maxDiscardSize: 4K
      name: vdo_storage
      physicalSize: 306742596K
      physicalThreads: 1
      slabSize: 2G
      uuid: null
      writePolicy: async

Hardware specs of the machine: Intel(R) Core(TM) i7-9850H CPU @ 2.60GHz 32GB memory disk partition on NVME disk

There is currently no virtualization in use, though KVM and linux-containers are compiled in as modules.

Anything else you may need to know, do let me know.

nkichukov avatar Aug 18 '20 08:08 nkichukov

Hello,

Thanks for the report; I was able to reproduce this behavior on a system running Fedora 32 with a 5.7 kernel.

Here's some output from "vmstat 1" after creating a new VDO volume directly on a test block device (i.e.: no layers below the VDO volume), with the command vdo create --name=vdo1 --device=<testdevice> --vdoLogicalSize=1T:

procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu-----
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa st
 1  0      0 30105680  66792 1406196    0    0     0   388 85875 185522  0  1 99  0  0
 1  0      0 30105680  66792 1406204    0    0     0     0 85105 184946  0  1 99  0  0
 2  0      0 30105680  66792 1406204    0    0     0     0 93577 193874  0  1 99  0  0
 1  0      0 30105680  66792 1406204    0    0     0     0 99219 199766  0  1 99  0  0
 1  0      0 30105680  66792 1406204    0    0     0     0 99274 199817  0  1 99  0  0
 1  0      0 30105680  66792 1406204    0    0     0     0 99055 199591  0  1 99  0  0
 1  0      0 30105680  66792 1406204    0    0     0     0 99294 199809  0  1 99  0  0
 1  0      0 30105680  66800 1406204    0    0     0    16 97547 197933  0  2 98  0  0

Note the high number of context switches per second ("cs"). If you run vmstat 1 on your system with the VDO volume remaining idle, do you see something similar?

(In my case, the VDO volume's index also had 6 zones; the test system has a total of 12 CPUs.)

bgurney-rh avatar Aug 18 '20 21:08 bgurney-rh

Hello Bryan,

This is confirmed, I got immediate increase of the number of context switches with approximately +180000 just by bringing up the vdo device. Turning it off, brings number of CS back to normal.

nkichukov avatar Aug 19 '20 09:08 nkichukov

same issue also happens on 5.5.10 amd64 kernels with: uds version 8.0.0.84 kvdo version 6.2.2.117

nkichukov avatar Aug 26 '20 14:08 nkichukov

Hi @nkichukov,

Thanks for confirming. We are investigating this further and have opened BZ1870660 to track this.

We believe this to be due to a bug that we recently fixed in another branch and will be working to further confirm this and apply the fix to the necessary releases.

rhawalsh avatar Aug 26 '20 14:08 rhawalsh

I've tested with package vdo-6.2.4.14-14.el8 Issue is still persist. Occurred high CPU context switching.

pavel-z1 avatar Dec 11 '20 09:12 pavel-z1