Flatcar
Flatcar copied to clipboard
etcd: higher disk latency and round trip time since 2765.2.0
We're seeing slower network peer round trip times for our bare metal etcd nodes since the update from stable 2605.12.0 to 2765.2.0 (and now 2765.2.1). This correlates with an increase in wal fsync and backend commit durations.
The 99th percentile for etcd_network_peer_round_trip_time_seconds
has increased from an average of 100ms to 400ms.
Rolling back to 2605.12.0 restores the previous level of performance, rolling forward again results in the same issue. So the version of Flatcar is very much relevant.
Here are some graphs that illustrate the issue. Our nodes rolled into 2765.2.0 on March 4th.
Round trip time:
Wal fsync duration:
Backend commit duration:
Weighted IOs:
I ran fio
on one of the nodes in 2605.12.0 and 2765.2.1 for comparison using the arguments suggested here:
fio --rw=write --ioengine=sync --fdatasync=1 --directory=/var/lib/etcd/tmp --size=22m --bs=2300 --name=etcd
2605.12.0:
etcd-2605.12.0-0: (g=0): rw=write, bs=(R) 2300B-2300B, (W) 2300B-2300B, (T) 2300B-2300B, ioengine=sync, iodepth=1
fio-3.26-19-ge7e53
Starting 1 process
etcd-2605.12.0-0: Laying out IO file (1 file / 22MiB)
etcd-2605.12.0-0: (groupid=0, jobs=1): err= 0: pid=19: Fri Mar 19 09:51:24 2021
write: IOPS=1508, BW=3387KiB/s (3469kB/s)(22.0MiB/6650msec); 0 zone resets
clat (usec): min=4, max=121, avg=14.26, stdev= 6.26
lat (usec): min=4, max=122, avg=14.65, stdev= 6.48
clat percentiles (usec):
| 1.00th=[ 7], 5.00th=[ 8], 10.00th=[ 9], 20.00th=[ 10],
| 30.00th=[ 11], 40.00th=[ 12], 50.00th=[ 15], 60.00th=[ 15],
| 70.00th=[ 15], 80.00th=[ 19], 90.00th=[ 24], 95.00th=[ 26],
| 99.00th=[ 35], 99.50th=[ 36], 99.90th=[ 42], 99.95th=[ 109],
| 99.99th=[ 114]
bw ( KiB/s): min= 3014, max= 3688, per=99.87%, avg=3383.69, stdev=217.19, samples=13
iops : min= 1342, max= 1642, avg=1506.69, stdev=96.73, samples=13
lat (usec) : 10=25.62%, 20=58.81%, 50=15.50%, 100=0.02%, 250=0.06%
fsync/fdatasync/sync_file_range:
sync (usec): min=217, max=12143, avg=645.20, stdev=404.94
sync percentiles (usec):
| 1.00th=[ 225], 5.00th=[ 233], 10.00th=[ 253], 20.00th=[ 265],
| 30.00th=[ 293], 40.00th=[ 355], 50.00th=[ 734], 60.00th=[ 799],
| 70.00th=[ 848], 80.00th=[ 906], 90.00th=[ 979], 95.00th=[ 1045],
| 99.00th=[ 1876], 99.50th=[ 1942], 99.90th=[ 2245], 99.95th=[ 4228],
| 99.99th=[11469]
cpu : usr=1.56%, sys=9.46%, ctx=15664, majf=0, minf=15
IO depths : 1=200.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
issued rwts: total=0,10029,0,0 short=10029,0,0,0 dropped=0,0,0,0
latency : target=0, window=0, percentile=100.00%, depth=1
Run status group 0 (all jobs):
WRITE: bw=3387KiB/s (3469kB/s), 3387KiB/s-3387KiB/s (3469kB/s-3469kB/s), io=22.0MiB (23.1MB), run=6650-6650msec
Disk stats (read/write):
sda: ios=0/21265, merge=0/11251, ticks=0/5707, in_queue=47, util=98.59%
2765.2.1:
etcd-2765.2.1-0: (g=0): rw=write, bs=(R) 2300B-2300B, (W) 2300B-2300B, (T) 2300B-2300B, ioengine=sync, iodepth=1
fio-3.26-19-ge7e53
Starting 1 process
etcd-2765.2.1-0: Laying out IO file (1 file / 22MiB)
etcd-2765.2.1-0: (groupid=0, jobs=1): err= 0: pid=104: Fri Mar 19 09:23:09 2021
write: IOPS=1330, BW=2989KiB/s (3060kB/s)(22.0MiB/7537msec); 0 zone resets
clat (usec): min=4, max=129, avg=16.99, stdev= 8.10
lat (usec): min=5, max=130, avg=17.43, stdev= 8.28
clat percentiles (usec):
| 1.00th=[ 7], 5.00th=[ 8], 10.00th=[ 9], 20.00th=[ 10],
| 30.00th=[ 12], 40.00th=[ 14], 50.00th=[ 15], 60.00th=[ 20],
| 70.00th=[ 23], 80.00th=[ 23], 90.00th=[ 26], 95.00th=[ 28],
| 99.00th=[ 38], 99.50th=[ 45], 99.90th=[ 117], 99.95th=[ 122],
| 99.99th=[ 127]
bw ( KiB/s): min= 2479, max= 3539, per=100.00%, avg=2990.80, stdev=344.45, samples=15
iops : min= 1104, max= 1576, avg=1331.73, stdev=153.37, samples=15
lat (usec) : 10=22.56%, 20=39.68%, 50=37.45%, 100=0.18%, 250=0.12%
fsync/fdatasync/sync_file_range:
sync (usec): min=216, max=10954, avg=729.63, stdev=452.31
sync percentiles (usec):
| 1.00th=[ 221], 5.00th=[ 227], 10.00th=[ 258], 20.00th=[ 281],
| 30.00th=[ 322], 40.00th=[ 404], 50.00th=[ 775], 60.00th=[ 914],
| 70.00th=[ 979], 80.00th=[ 1057], 90.00th=[ 1188], 95.00th=[ 1319],
| 99.00th=[ 2008], 99.50th=[ 2114], 99.90th=[ 2474], 99.95th=[ 4686],
| 99.99th=[ 9896]
cpu : usr=2.03%, sys=9.98%, ctx=19296, majf=0, minf=13
IO depths : 1=200.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
issued rwts: total=0,10029,0,0 short=10029,0,0,0 dropped=0,0,0,0
latency : target=0, window=0, percentile=100.00%, depth=1
Run status group 0 (all jobs):
WRITE: bw=2989KiB/s (3060kB/s), 2989KiB/s-2989KiB/s (3060kB/s-3060kB/s), io=22.0MiB (23.1MB), run=7537-7537msec
Disk stats (read/write):
sda: ios=0/20836, merge=0/11018, ticks=0/5973, in_queue=5973, util=98.73%
And here's the output of some commands that might provide some insight into our setup. Please let me know if there's any other information that could be helpful.
$ etcd --version
etcd Version: 3.4.13
Git SHA: ae9734ed2
Go Version: go1.12.17
Go OS/Arch: linux/amd64
$ mount | grep sda1
/dev/sda1 on /var/lib/etcd type ext4 (rw,relatime,seclabel)
$ sudo fdisk -l /dev/sda1
Disk /dev/sda1: 23.9 GiB, 25605806080 bytes, 50011340 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
$ sudo lshw -c disk -c storage
*-storage
description: SATA controller
product: 8 Series/C220 Series Chipset Family 6-port SATA Controller 1 [AHCI mode]
vendor: Intel Corporation
physical id: 1f.2
bus info: pci@0000:00:1f.2
version: 05
width: 32 bits
clock: 66MHz
capabilities: storage msi pm ahci_1.0 bus_master cap_list
configuration: driver=ahci latency=0
resources: irq:25 ioport:3040(size=8) ioport:3048(size=4) ioport:3050(size=8) ioport:3058(size=4) ioport:3060(size=32) memory:fb7e0000-fb7e07ff
*-scsi
physical id: 1
logical name: scsi4
capabilities: emulated
*-disk
description: ATA Disk
product: ORIGIN Inception
physical id: 0.0.0
bus info: scsi@4:0.0.0
logical name: /dev/sda
version: 3A0
serial: 522492002150124
size: 238GiB (256GB)
capabilities: gpt-1.00 partitioned partitioned:gpt
configuration: ansiversion=5 guid=9e5b4701-156e-4856-a8f1-5ab1e1f7f5f7 sectorsize=512
We experience scimark2 test degradation for 15% as well after upgrading to 2765.2.0+
Could you check if any of the block schedulers affect the disk latency for you? /sys/block/sda/queue/scheduler
. The default may have changed.