seastar
seastar copied to clipboard
iotune: Random IO buffer size is not always correct
From https://github.com/scylladb/scylladb/issues/13477 : doing random writes test with auto-selected 512 bytes (or even 1k) makes i4i instance's drive ro read-modify-write thus dropping down the resulting IOPS rate. Need to make random writes block size larger ... somehow
There's the physical block size in /sys/block/nvme0n1/queue/physical_block_size.
Note: we still want to write the commitlog with logical block size, there's hope it avoids RMW since it's a stream.
There's the physical block size in /sys/block/nvme0n1/queue/physical_block_size.
Note: we still want to write the commitlog with logical block size, there's hope it avoids RMW since it's a stream.
Not optimal_io_size? (assuming it's not 0, which many times it is...)
Finally found what I was looking for. From https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/storage-optimized-instances.html#storage-instances-diskperf :
This decrease in performance is even larger if the write operations are not in multiples of 4,096 bytes or not aligned to a 4,096-byte boundary. If you write a smaller amount of bytes or bytes that are not aligned, the SSD controller must read the surrounding data and store the result in a new location. This pattern results in significantly increased write amplification, increased latency, and dramatically reduced I/O performance.
Example from GCP, where sector size is 4096:
"nvme0n1": {
"holders": [],
"host": "Non-Volatile memory controller: Google, Inc. Device 001f (rev 01)",
"links": {
"ids": [
"google-local-nvme-ssd-0",
"nvme-nvme.1ae0-6e766d655f63617264-6e766d655f63617264-00000001"
],
"labels": [],
"masters": [
"md0"
],
"uuids": []
},
"model": "nvme_card",
"partitions": {},
"removable": "0",
"rotational": "0",
"sas_address": null,
"sas_device_handle": null,
"scheduler_mode": "none",
"sectors": "786432000",
"sectorsize": "4096", <---- THIS
"size": "375.00 GB",
"support_discard": "4096",
"vendor": null,
Reviving this issue, in the hope it'll make it to Scylla 6.0. Ping @xemul , @avikivity
Collected from i4i.4xl by @pwrobelse
Hello, please find some experiments with iotune
related to buffer size of random IO. I used i4i.4xlarge
instance type and AMI
with ScyllaDB 5.4.6
. The measurements are a follow-up to issue#13477.
To prepare the machine the following steps were performed:
- I opened ScyllaDB website with available AMIs of open source version and used version 5.4.6.
- An instance of
i4i.4xlarge
withScyllaDB 5.4.6 AMI
was launched via EC2 console. - I ensured that setup of ScyllaDB had finished and then I performed the steps described in the issue#13477 to reproduce low IOPS of random write - the logs from setup can be found below.
Step 1: stop scylla-server and verify that it is not running.
scyllaadm@ip-172-31-46-190:~$ sudo systemctl stop scylla-server
scyllaadm@ip-172-31-46-190:~$ ps aux | grep scylla
scylla 534 0.0 0.0 1240236 11700 ? Ssl 10:41 0:00 /opt/scylladb/node_exporter/node_exporter --collector.interrupts
root 4015 0.0 0.0 16928 10968 ? Ss 10:45 0:00 sshd: scyllaadm [priv]
scyllaa+ 4032 0.0 0.0 16932 9792 ? Ss 10:45 0:00 /lib/systemd/systemd --user
scyllaa+ 4033 0.0 0.0 169820 4396 ? S 10:45 0:00 (sd-pam)
scyllaa+ 4048 0.0 0.0 17200 7932 ? S 10:45 0:00 sshd: scyllaadm@pts/0
scyllaa+ 4049 0.0 0.0 5048 4108 pts/0 Ss 10:45 0:00 -bash
scyllaa+ 4186 0.0 0.0 7484 3320 pts/0 R+ 10:46 0:00 ps aux
scyllaa+ 4187 0.0 0.0 4024 2000 pts/0 S+ 10:46 0:00 grep --color=auto scylla
Step 2: list available disks:
scyllaadm@ip-172-31-46-190:~$ lsblk
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINTS
nvme0n1 259:0 0 30G 0 disk
├─nvme0n1p1 259:1 0 29.9G 0 part /
├─nvme0n1p14 259:2 0 4M 0 part
└─nvme0n1p15 259:3 0 106M 0 part /boot/efi
nvme1n1 259:4 0 3.4T 0 disk /var/lib/systemd/coredump
/var/lib/scylla
Step 3: run perftune.py.
scyllaadm@ip-172-31-46-190:~$ sudo /opt/scylladb/scripts/perftune.py --nic eth0 --tune-clock --dir /var/lib/scylla --tune disks --tune net --tune system --dev nvme1n1
irqbalance is not running
No non-NVMe disks to tune
Setting NVMe disks: nvme1n1...
Setting mask 00000001 in /proc/irq/24/smp_affinity
Writing 'none' to /sys/devices/pci0000:00/0000:00:1f.0/nvme/nvme1/nvme1n1/queue/scheduler
Writing '2' to /sys/devices/pci0000:00/0000:00:1f.0/nvme/nvme1/nvme1n1/queue/nomerges
Setting a physical interface eth0...
Executing: ethtool -L eth0 rx 2
Executing: ethtool -L eth0 combined 2
Distributing IRQs handling Rx and Tx for first 2 channels:
Setting mask 00000001 in /proc/irq/45/smp_affinity
Setting mask 00000100 in /proc/irq/46/smp_affinity
Distributing the rest of IRQs
Setting mask 0000ffff in /sys/class/net/eth0/queues/rx-1/rps_cpus
Setting mask 0000ffff in /sys/class/net/eth0/queues/rx-0/rps_cpus
Setting net.core.rps_sock_flow_entries to 32768
Setting limit 16384 in /sys/class/net/eth0/queues/rx-1/rps_flow_cnt
Setting limit 16384 in /sys/class/net/eth0/queues/rx-0/rps_flow_cnt
Trying to enable ntuple filtering HW offload for eth0...not supported
Setting mask 00000f0f in /sys/class/net/eth0/queues/tx-0/xps_cpus
Setting mask 0000f0f0 in /sys/class/net/eth0/queues/tx-1/xps_cpus
Writing '4096' to /proc/sys/net/core/somaxconn
Writing '4096' to /proc/sys/net/ipv4/tcp_max_syn_backlog
Setting clocksource to tsc
Step 4: run iotune provided by the AMI - low IOPS of random write is visible (actual: 91k, expected: 200k).
scyllaadm@ip-172-31-46-190:~$ sudo iotune --evaluation-directory /var/lib/scylla --properties-file /tmp/io_properties.yaml
INFO 2024-05-09 10:49:02,609 seastar - Reactor backend: linux-aio
INFO 2024-05-09 10:49:02,798 [shard 0:main] iotune - /var/lib/scylla passed sanity checks
INFO 2024-05-09 10:49:02,799 [shard 0:main] iotune - Disk parameters: max_iodepth=127 disks_per_array=1 minimum_io_size=512
Starting Evaluation. This may take a while...
Measuring sequential write bandwidth: 1685 MB/s (deviation 5%)
Measuring sequential read bandwidth: 2960 MB/s (deviation 4%)
Measuring random write IOPS: 91700 IOPS
Measuring random read IOPS: 313992 IOPS
Writing result to /tmp/io_properties.yaml
Step 5: check the sizes of block and IO provided by /sys/block/nvme1n1/queue - 4KB is not present:
scyllaadm@ip-172-31-46-190:~$ cat /sys/block/nvme1n1/queue/physical_block_size
512
scyllaadm@ip-172-31-46-190:~$ cat /sys/block/nvme1n1/queue/logical_block_size
512
scyllaadm@ip-172-31-46-190:~$ cat /sys/block/nvme1n1/queue/minimum_io_size
512
scyllaadm@ip-172-31-46-190:~$ cat /sys/block/nvme1n1/queue/optimal_io_size
0
Step 6: build iotune from the latest master and run it with the same command - low IOPS of random write is visible (actual: 91k, expected: 200k).
scyllaadm@ip-172-31-46-190:~/repo/seastar$ sudo ./build/release/apps/iotune/iotune --evaluation-directory /var/lib/scylla --properties-file /tmp/io_properties.yaml
INFO 2024-05-09 11:01:00,909 seastar - Reactor backend: io_uring
INFO 2024-05-09 11:01:01,178 [shard 0:main] iotune - /var/lib/scylla passed sanity checks
INFO 2024-05-09 11:01:01,179 [shard 0:main] iotune - Disk parameters: max_iodepth=127 disks_per_array=1 minimum_io_size=512
INFO 2024-05-09 11:01:01,180 [shard 0:main] iotune - Filesystem parameters: read alignment 512, write alignment 1024
Starting Evaluation. This may take a while...
Measuring sequential write bandwidth: 1683 MB/s (deviation 5%)
Measuring sequential read bandwidth: 2946 MB/s (deviation 4%)
Measuring random write IOPS: 91662 IOPS
Measuring random read IOPS: 314742 IOPS
Writing result to /tmp/io_properties.yaml
Step 7: apply patch from PR#2204. Force buffer size to 4096 for random IO with the new parameter.
Interestingly, the behavior was not consistent. Depending on the machine I saw different results in spite of performing exactly the same steps. Random write IOPS either was better (140k instead of 91k, still below expected 200k)
or was much worse (47K vs 91K)
. In total I used 6-7 instances and the increase/decrease of IOPS seemed to occur randomly - the result was either 47k or 140k.
Machine 1 - degradation:
scyllaadm@ip-172-31-46-190:~/repo/seastar$ sudo ./build/release/apps/iotune/iotune --random-io-buffer-size 4096 --evaluation-directory /var/lib/scylla --properties-file /tmp/io_properties.yaml
INFO 2024-05-09 11:05:48,858 seastar - Reactor backend: io_uring
INFO 2024-05-09 11:05:49,070 [shard 0:main] iotune - /var/lib/scylla passed sanity checks
INFO 2024-05-09 11:05:49,071 [shard 0:main] iotune - Disk parameters: max_iodepth=127 disks_per_array=1 minimum_io_size=512
INFO 2024-05-09 11:05:49,071 [shard 0:main] iotune - Forcing buffer_size=4096 for random IO!
INFO 2024-05-09 11:05:49,072 [shard 0:main] iotune - Filesystem parameters: read alignment 512, write alignment 1024
Starting Evaluation. This may take a while...
Measuring sequential write bandwidth: 1672 MB/s (deviation 6%)
Measuring sequential read bandwidth: 2945 MB/s (deviation 4%)
Measuring random write IOPS: 47711 IOPS
Measuring random read IOPS: 261883 IOPS
Writing result to /tmp/io_properties.yaml
Machine 2 - improvement:
scyllaadm@ip-172-31-44-9:~/repo/seastar$ sudo ./build/release/apps/iotune/iotune --random-io-buffer-size 4096 --evaluation-directory /var/lib/scylla --properties-file /tmp/io_properties.yaml
INFO 2024-05-09 11:56:00,265 seastar - Reactor backend: io_uring
INFO 2024-05-09 11:56:00,473 [shard 0:main] iotune - /var/lib/scylla passed sanity checks
INFO 2024-05-09 11:56:00,474 [shard 0:main] iotune - Disk parameters: max_iodepth=127 disks_per_array=1 minimum_io_size=512
INFO 2024-05-09 11:56:00,474 [shard 0:main] iotune - Forcing buffer_size=4096 for random IO!
INFO 2024-05-09 11:56:00,475 [shard 0:main] iotune - Filesystem parameters: read alignment 512, write alignment 1024
Starting Evaluation. This may take a while...
Measuring sequential write bandwidth: 1870 MB/s (deviation 15%)
Measuring sequential read bandwidth: 2946 MB/s (deviation 4%)
Measuring random write IOPS: 140260 IOPS
Measuring random read IOPS: 275969 IOPS
Writing result to /tmp/io_properties.yaml
Step 8: re-mount XFS with BS=4096 instead of BS=1024.
When scylla_raid_setup
runs mkfs.xfs
it uses block_size = max(1024, sector_size)
. The code can be found here. Therefore, xfs_info returned block size equals 1024. I tried to mount XFS with BS=4096 - it is a default value of block size (source here).
scyllaadm@ip-172-31-46-190:~/repo/seastar$ sudo umount /var/lib/scylla
scyllaadm@ip-172-31-46-190:~/repo/seastar$ sudo umount /var/lib/systemd/coredump
scyllaadm@ip-172-31-46-190:~/repo/seastar$ sudo mkfs.xfs -f -b size=4096 /dev/nvme1n1 -K
meta-data=/dev/nvme1n1 isize=512 agcount=4, agsize=228881836 blks
= sectsz=512 attr=2, projid32bit=1
= crc=1 finobt=1, sparse=1, rmapbt=0
= reflink=1 bigtime=0 inobtcount=0
data = bsize=4096 blocks=915527343, imaxpct=5
= sunit=0 swidth=0 blks
naming =version 2 bsize=4096 ascii-ci=0, ftype=1
log =internal log bsize=4096 blocks=447034, version=2
= sectsz=512 sunit=0 blks, lazy-count=1
realtime =none extsz=4096 blocks=0, rtextents=0
scyllaadm@ip-172-31-46-190:~/repo/seastar$ sudo mount /dev/nvme1n1 /var/lib/scylla
scyllaadm@ip-172-31-46-190:~/repo/seastar$ lsblk
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINTS
nvme0n1 259:0 0 30G 0 disk
├─nvme0n1p1 259:1 0 29.9G 0 part /
├─nvme0n1p14 259:2 0 4M 0 part
└─nvme0n1p15 259:3 0 106M 0 part /boot/efi
nvme1n1 259:4 0 3.4T 0 disk /var/lib/scylla
Step 9: rerun the test with BS=4096 - random write IOPS increased to 240k on both machines.
Machine 1:
scyllaadm@ip-172-31-46-190:~/repo/seastar$ sudo ./build/release/apps/iotune/iotune --random-io-buffer-size 4096 --evaluation-directory /var/lib/scylla --properties-file /tmp/io_properties.yaml
INFO 2024-05-09 11:16:52,469 seastar - Reactor backend: io_uring
INFO 2024-05-09 11:16:52,670 [shard 0:main] iotune - /var/lib/scylla passed sanity checks
INFO 2024-05-09 11:16:52,671 [shard 0:main] iotune - Disk parameters: max_iodepth=127 disks_per_array=1 minimum_io_size=512
INFO 2024-05-09 11:16:52,671 [shard 0:main] iotune - Forcing buffer_size=4096 for random IO!
INFO 2024-05-09 11:16:52,672 [shard 0:main] iotune - Filesystem parameters: read alignment 512, write alignment 4096
Starting Evaluation. This may take a while...
Measuring sequential write bandwidth: 2181 MB/s (deviation 3%)
Measuring sequential read bandwidth: 2950 MB/s (deviation 8%)
Measuring random write IOPS: 242899 IOPS (deviation 12%)
Measuring random read IOPS: 350294 IOPS
Writing result to /tmp/io_properties.yaml
Machine 2:
scyllaadm@ip-172-31-44-9:~/repo/seastar$ sudo ./build/release/apps/iotune/iotune --random-io-buffer-size 4096 --evaluation-directory /var/lib/scylla --properties-file /tmp/io_properties.yaml
INFO 2024-05-09 11:59:04,329 seastar - Reactor backend: io_uring
INFO 2024-05-09 11:59:04,533 [shard 0:main] iotune - /var/lib/scylla passed sanity checks
INFO 2024-05-09 11:59:04,534 [shard 0:main] iotune - Disk parameters: max_iodepth=127 disks_per_array=1 minimum_io_size=512
INFO 2024-05-09 11:59:04,534 [shard 0:main] iotune - Forcing buffer_size=4096 for random IO!
INFO 2024-05-09 11:59:04,535 [shard 0:main] iotune - Filesystem parameters: read alignment 512, write alignment 4096
Starting Evaluation. This may take a while...
Measuring sequential write bandwidth: 2181 MB/s (deviation 3%)
Measuring sequential read bandwidth: 2945 MB/s (deviation 8%)
Measuring random write IOPS: 239202 IOPS (deviation 12%)
Measuring random read IOPS: 351342 IOPS
Writing result to /tmp/io_properties.yaml
Step 10: rerun iotune from ScyllaDB 5.4.6 AMI.
The actual block size used by iotune in this case was 4096, because it set buffer_size = std::max(buffer_size, _file.disk_write_dma_alignment());
. The alignment returned from posix_file_impl
uses block size defined for XFS as write alignment. The result was better - random write IOPS increased from 91k to 240k
.
Machine 1:
scyllaadm@ip-172-31-46-190:~/repo/seastar$ sudo iotune --evaluation-directory /var/lib/scylla --properties-file /tmp/io_properties.yaml
INFO 2024-05-09 11:24:19,329 seastar - Reactor backend: linux-aio
INFO 2024-05-09 11:24:19,514 [shard 0:main] iotune - /var/lib/scylla passed sanity checks
INFO 2024-05-09 11:24:19,515 [shard 0:main] iotune - Disk parameters: max_iodepth=127 disks_per_array=1 minimum_io_size=512
Starting Evaluation. This may take a while...
Measuring sequential write bandwidth: 2181 MB/s (deviation 3%)
Measuring sequential read bandwidth: 2969 MB/s (deviation 7%)
Measuring random write IOPS: 240261 IOPS (deviation 12%)
Measuring random read IOPS: 312452 IOPS
Writing result to /tmp/io_properties.yaml
Machine 2:
scyllaadm@ip-172-31-44-9:~/repo/seastar$ sudo iotune --evaluation-directory /var/lib/scylla --properties-file /tmp/io_properties.yaml
INFO 2024-05-09 12:03:04,121 seastar - Reactor backend: linux-aio
INFO 2024-05-09 12:03:04,305 [shard 0:main] iotune - /var/lib/scylla passed sanity checks
INFO 2024-05-09 12:03:04,306 [shard 0:main] iotune - Disk parameters: max_iodepth=127 disks_per_array=1 minimum_io_size=512
Starting Evaluation. This may take a while...
Measuring sequential write bandwidth: 2181 MB/s (deviation 3%)
Measuring sequential read bandwidth: 2969 MB/s (deviation 7%)
Measuring random write IOPS: 240272 IOPS (deviation 12%)
Measuring random read IOPS: 313831 IOPS
Writing result to /tmp/io_properties.yaml
Summary of the experiments
In spite of performing exactly the same steps on different machines, that used the same exact instance type (i4i.4xlarge) the results from iotune were inconsistent, when block size was only changed on the level of iotune. Using iotune --random-io-buffer-size 4096
either decreased random write IOPS to 47k
or increased it to 140k
. It looked randomly.
When mkfs.xfs
was run with block size 4096, then on both machines I saw 240k random write IOPS.
The table below summarizes the obtained results:
iotune binary and backend | XFS_block_size | iotune_block_size | random_write_IOPS | |
---|---|---|---|---|
5.4.6 AMI + linux-aio | 1024 | 1024 | 91k | |
master + io_uring | 1024 | 1024 | 91k | |
master + io_uring | 1024 | 4096 | Inconsistent - either 47k or 140k | |
5.4.6 AMI + linux-aio | 4096 | 4096 | 240k | Note: I am not sure if re-running mkfs.xfs with -f -K corrupts the results |
master + io_uring | 4096 | 4096 | 240k | Note: I am not sure if re-running mkfs.xfs with -f -K corrupts the results |
The values exposed by /sys/block/nvme1n1/queue
did not contain 4096. In the case of mkfs.xfs it is a hard-coded default value.
- Please add the Reactor backend that you've used - in some cases it was io_uring, in some cases linux-aio.
- It would be interesting (not now, but in the very near term future) to re-try with Ubuntu LTS 24.04, which is using kernel 6.8.
Please add the Reactor backend that you've used - in some cases it was io_uring, in some cases linux-aio.
It would be interesting (not now, but in the very near term future) to re-try with Ubuntu LTS 24.04, which is using kernel 6.8.
Regarding the first question - it seems that iotune built from master and run in seastar's repo used io_uring
by default. On the other hand iotune from ScyllaDB 5.4.6 AMI used linux-aio
by default. I updated the summary at the end of the first comment.
The results from both back-end looked similar - see the statements below.
- The results obtained via iotune from
AMI (backend=linux-aio)
withXFS_BS=1024
andiotune_BS=1024
had the same value of random write IOPS, that the results obtained via iotune from the latestmaster (backend=io_uring)
withXFS_BS=1024
andiotune_BS=1024
. Please compare the logs from step 4 and from step 6. - The results obtained via iotune from
AMI (backend=linux-aio)
withXFS_BS=4096
andiotune_BS=4096
had the same value of random write IOPS, that the results obtained via iotune from the latestmaster (backend=io_uring)
withXFS_BS=4096
andiotune_BS=4096
. Please compare the logs from step 9 and step 10.
During the next experiments I will specify the back-end explicitly.
Hello, please find the results of another experiment related to specifying different value of block size for XFS during scylla_setup
. The goal was to check if the results that had been seen after re-creating XFS with greater block size in the previous experiments can be reproduced.
Test scenario
- Create
i4i.4xlarge
instance withUbuntu 22.04
. - Install ScyllaDB from a package according to the official tutorial.
- Optional: tweak block size value passed to XFS in
/opt/scylladb/scripts/libexec/scylla_raid_setup
. - Call
scylla_setup
, configure RAID0 and XFS and inspect results from iotune.
Used scylla package:
ubuntu@ip-172-31-41-145:~$ scylla --version
5.4.6-0.20240418.10f137e367e3
Execution 1: XFS block size equals 1024 (default value used by scylla_raid_setup
)
The first scenario serves as a control sample. The problem with low random write IOPS was reproduced (91K IOPS
). Note: after alignment iotune used BS=1024.
Do you want IOTune to study your disks IO profile and adapt Scylla to it? (*WARNING* Saying NO here means the node will not boot in production mode unless you configure the I/O Subsystem manually!)
Yes - let iotune study my disk(s). Note that this action will take a few minutes. No - skip this step.
[YES/no]
tuning /sys/devices/pci0000:00/0000:00:1f.0/nvme/nvme1/nvme1n1
tuning: /sys/devices/pci0000:00/0000:00:1f.0/nvme/nvme1/nvme1n1/queue/nomerges 2
tuning /sys/devices/pci0000:00/0000:00:1f.0/nvme/nvme1/nvme1n1
tuning /sys/devices/pci0000:00/0000:00:1f.0/nvme/nvme1/nvme1n1
tuning /sys/devices/pci0000:00/0000:00:1f.0/nvme/nvme1/nvme1n1
tuning /sys/devices/pci0000:00/0000:00:1f.0/nvme/nvme1/nvme1n1
INFO 2024-05-09 13:58:44,485 seastar - Reactor backend: linux-aio
INFO 2024-05-09 13:58:44,710 [shard 0:main] iotune - /var/lib/scylla/saved_caches passed sanity checks
INFO 2024-05-09 13:58:44,710 [shard 0:main] iotune - Disk parameters: max_iodepth=127 disks_per_array=1 minimum_io_size=512
Starting Evaluation. This may take a while...
Measuring sequential write bandwidth: 2180 MB/s
Measuring sequential read bandwidth: 2971 MB/s (deviation 7%)
Measuring random write IOPS: 91687 IOPS
Measuring random read IOPS: 313118 IOPS
Writing result to /etc/scylla.d/io_properties.yaml
Writing result to /etc/scylla.d/io.conf
Execution 2: XFS block size equals 4096
The used block size value for XFS was changed to 4096 in /opt/scylladb/scripts/libexec/scylla_raid_setup
. Then scylla_setup
was run. The problem did not occur - iotune from ScyllaDB 5.4.6 showed 240k random write IOPS. Note: after alignment iotune used BS=4096.
Do you want IOTune to study your disks IO profile and adapt Scylla to it? (*WARNING* Saying NO here means the node will not boot in production mode unless you configure the I/O Subsystem manually!)
Yes - let iotune study my disk(s). Note that this action will take a few minutes. No - skip this step.
[YES/no]
tuning /sys/devices/pci0000:00/0000:00:1f.0/nvme/nvme1/nvme1n1
tuning: /sys/devices/pci0000:00/0000:00:1f.0/nvme/nvme1/nvme1n1/queue/nomerges 2
tuning /sys/devices/pci0000:00/0000:00:1f.0/nvme/nvme1/nvme1n1
tuning /sys/devices/pci0000:00/0000:00:1f.0/nvme/nvme1/nvme1n1
tuning /sys/devices/pci0000:00/0000:00:1f.0/nvme/nvme1/nvme1n1
tuning /sys/devices/pci0000:00/0000:00:1f.0/nvme/nvme1/nvme1n1
INFO 2024-05-09 14:09:05,996 seastar - Reactor backend: linux-aio
INFO 2024-05-09 14:09:06,217 [shard 0:main] iotune - /var/lib/scylla/saved_caches passed sanity checks
INFO 2024-05-09 14:09:06,218 [shard 0:main] iotune - Disk parameters: max_iodepth=127 disks_per_array=1 minimum_io_size=512
Starting Evaluation. This may take a while...
Measuring sequential write bandwidth: 2179 MB/s
Measuring sequential read bandwidth: 2971 MB/s (deviation 7%)
Measuring random write IOPS: 239977 IOPS (deviation 11%)
Measuring random read IOPS: 311838 IOPS
Writing result to /etc/scylla.d/io_properties.yaml
Writing result to /etc/scylla.d/io.conf
Summary
Given the previous experiments and the current one it appears, that random write IOPS measured by iotune are correct when both XFS_BS=4096
and iotune_BS=4096
on i4i.4xlarge
.
There was a reason to use 1K, something with the RAID stripes... @xemul ?
There was a reason to use 1K, something with the RAID stripes... @xemul ?
No, it's just due to the way commitlog works. It needs to write aligned buffers so with 4k minima IO segments may grow too fast