IO values for i7ie.2xlarge are identical to i7ie.xlarge
i7ie.xlarge: read_iops: 117257 read_bandwidth: 1148572714 write_iops: 94180 write_bandwidth: 505684885 i7ie.2xlarge: read_iops: 117257 read_bandwidth: 1148572714 write_iops: 94180 write_bandwidth: 505684885Whereas https://docs.aws.amazon.com/ec2/latest/instancetypes/so.html#so_instance-store says:
i7ie.large 1 x 1250 GB NVMe SSD 54,166 / 43,333 ✓ i7ie.xlarge 1 x 2500 GB NVMe SSD 108,333 / 86,666 ✓ i7ie.2xlarge 2 x 2500 GB NVMe SSD 216,666 / 173,332 ✓So double the performance between them.
Originally posted by @mykaul in #559
@syuu1228 is it just a copy-paste error or these are the values we get for i7ie.2xlarge?
Here are raw values of i7ie iotune results, it executed 3 times on each instances, and averages are shown at the bottom. It seems i7ie.large uses slowest SSD, i7ie.xlarge and i7ie.2xlarge using medium speed SSD, and larger instances (3xlarge, 6xlarge, 12xlarge...) using fastest SSD. So I decided copy i7ie.xlarge parameters to i7ie.2xlarge.
instance_type read_iops read_bandwidth write_iops write_bandwidth
i7ie.large.0 58450 574886272 47148 253077216
i7ie.large.1 58447 574871872 47144 253152960
i7ie.large.2 58452 574805824 47145 253168576
i7ie.xlarge.0 117261 1148629760 94184 505675680
i7ie.xlarge.1 117253 1148723072 94177 505678784
i7ie.xlarge.2 117257 1148365312 94181 505700192
i7ie.2xlarge.0 117266 1148608256 94166 505745536
i7ie.2xlarge.1 117270 1148134016 94161 505677120
i7ie.2xlarge.2 117266 1148369024 94177 505698688
i7ie.3xlarge.0 352843 3422118400 119127 1526412672
i7ie.3xlarge.1 352844 3421702144 118973 1526431104
i7ie.3xlarge.2 352817 3424049152 119881 1526483456
i7ie.6xlarge.0 352813 3432191232 118808 1526523520
i7ie.6xlarge.1 352837 3424704512 119741 1526485376
i7ie.6xlarge.2 352822 3428499456 119246 1526491136
i7ie.12xlarge.0 352835 3422241280 119566 1526699136
i7ie.12xlarge.1 352829 3425162240 119214 1526719872
i7ie.12xlarge.2 352832 3424160000 118033 1526715776
i7ie.18xlarge.0 352825 3425764608 119544 1526821376
i7ie.18xlarge.1 352816 3425155072 119555 1526833664
i7ie.18xlarge.2 352815 3428421120 119518 1526809088
i7ie.24xlarge.0 352824 3424752128 119147 1526504320
i7ie.24xlarge.1 352832 3422750976 119154 1526500992
i7ie.24xlarge.2 352826 3424811008 119535 1526532480
i7ie.48xlarge.0 352834 3423049728 119516 1526578176
i7ie.48xlarge.1 352831 3424246016 119574 1526628352
i7ie.48xlarge.2 352815 3428381440 119226 1526603392
i7ie.large 58449 574854656 47145 253132917
i7ie.xlarge 117257 1148572714 94180 505684885
i7ie.2xlarge 117267 1148370432 94168 505707114
i7ie.3xlarge 352834 3422623232 119327 1526442410
i7ie.6xlarge 352824 3428465066 119265 1526500010
i7ie.12xlarge 352832 3423854506 118937 1526711594
i7ie.18xlarge 352818 3426446933 119539 1526821376
i7ie.24xlarge 352827 3424104704 119278 1526512597
i7ie.48xlarge 352826 3425225728 119438 1526603306
I just ran iotune on i7ie.2xlarge again to make sure current parameter is correct, result was:
disks:
- mountpoint: /var/lib/scylla
read_iops: 117258
read_bandwidth: 1150152192
write_iops: 94138
write_bandwidth: 508599840
Seems like result is unchanged, it almost same performance as i7ie.xlarge.
These results don't seem to match Amazon's documentation, but isn't the parameters that Amazon's documentation indicates are the sum of all drives? If it correct, probably it seems to match our measurements.
Sounds ok to me. We will know the real effect only when running performance tests on these instance types, but I don't think it's necessary right now. we may want @avikivity or @xemul to approve that.
Can we run fio to re-verify? It makes little sense. And we can open a support issue with AWS to clarify that.
Here's fio benchmark result on i7ie instances. I used this script to run fio benchmark, which is borrowed from Alibaba Cloud document, and I added settings to run 4k random read, 4k random write, 1024k sequential read, 1024k sequential write.
Below are results. It seems write IOPS on larger nodes are faster than iotune, but rest of them are almost same.
instance_type read_iops read_bandwidth write_iops write_bandwidth
i7ie.large.0 57358.897444 529908613 46014.93184 259432856
i7ie.large.1 57425.290804 530048639 46048.393548 259363116
i7ie.large.2 57363.097024 529908987 46013.29867 259441124
i7ie.xlarge.0 114826.156512 1059567362 91784.735877 519619300
i7ie.xlarge.1 114901.779763 1059531011 91780.403266 519585084
i7ie.xlarge.2 114817.590988 1059636413 91773.752791 519619610
i7ie.2xlarge.0 114499.933351 1060448511 91714.704608 519484064
i7ie.2xlarge.1 114628.811357 1060098813 91753.59035 519242601
i7ie.2xlarge.2 114634.264196 1059995135 91749.3336 519415015
i7ie.3xlarge.0 341885.356262 3164666192 271560.887822 1591678703
i7ie.3xlarge.1 341857.15714 3164807604 271270.670177 1592670126
i7ie.3xlarge.2 341912.547909 3164877365 271464.740385 1592007490
i7ie.6xlarge.0 341806.064645 3163721089 271102.225776 1592563132
i7ie.6xlarge.1 341803.378649 3163337088 271409.52254 1591173459
i7ie.6xlarge.2 341627.54898 3164103311 271284.876553 1591975327
i7ie.12xlarge.0 341811.719244 3162599725 271509.075166 1589655775
i7ie.12xlarge.1 341687.506245 3162152063 271441.783787 1590894341
i7ie.12xlarge.2 341788.307795 3161216516 271374.100719 1590740381
i7ie.18xlarge.0 341754.229386 3161028367 271493.025269 1590387473
i7ie.18xlarge.1 341806.79964 3161857281 271453.425661 1591915683
i7ie.18xlarge.2 341805.860806 3161339210 271366.006326 1589874483
i7ie.24xlarge.0 341971.82911 3159988937 271577.088258 1593055793
i7ie.24xlarge.1 341996.970201 3161060380 271490.462399 1589832572
i7ie.24xlarge.2 341951.683261 3160438313 271478.476546 1590657273
i7ie.48xlarge.0 342012.549933 3157185206 271378.191259 1590598469
i7ie.48xlarge.1 342010.387189 3160187071 271492.643632 1590283562
i7ie.48xlarge.2 341819.746347 3161257346 271371.080487 1590534718
i7ie.large 57382 529955413 46025 259412365
i7ie.xlarge 114848 1059578262 91779 519607998
i7ie.2xlarge 114587 1060180819 91739 519380560
i7ie.3xlarge 341885 3164783720 271432 1592118773
i7ie.6xlarge 341745 3163720496 271265 1591903972
i7ie.12xlarge 341762 3161989434 271441 1590430165
i7ie.18xlarge 341788 3161408286 271437 1590725879
i7ie.24xlarge 341973 3160495876 271515 1591181879
i7ie.48xlarge 341947 3159543207 271413 1590472249
(RAW result files are here)
And we can open a support issue with AWS to clarify that.
I tried to open an issue in AWS support but I got "You don't have the necessary IAM permissions to view that support case". Can anyone else who have enough permission to open the issue?
Please file a ticket with [email protected] to give you permissions to engage with AWS support.
@syuu1228 I have direct contacts to the storage team working on i7ie and i7i. I'll forward you their information, please CC me on email thread to them.
@roydahan - any news? Could it be that we are not setting RAID0 on i7ie.2xlarge?
Also seem that:
i7ie.3xlarge 341885 3164783720 271432 1592118773
i7ie.6xlarge 341745 3163720496 271265 1591903972
has the same issue?
@roydahan - any news? Could it be that we are not setting RAID0 on i7ie.2xlarge?
I don't remember I've seen an email. @yaronkaikov let's move it forward together.
@syuu1228 I'm re-reading your original comment and I'm bit confused. You wrote:
I just ran iotune on i7ie.2xlarge again to make sure current parameter is correct, result was:
disks:
- mountpoint: /var/lib/scylla read_iops: 117258 read_bandwidth: 1150152192 write_iops: 94138 write_bandwidth: 508599840 Seems like result is unchanged, it almost same performance as i7ie.xlarge.
These results don't seem to match Amazon's documentation, but isn't the parameters that Amazon's documentation indicates are the sum of all drives? If it correct, probably it seems to match our measurements.
The answer is that Amazon's documentation do indicates the sum of all drives. But I don't understand the second part, where you write that in match our measurements. How? The measurement you show above for i7i2.2xlarge is half of what they publish: read_iops: 117K vs 216K write_iops: 94K vs 173K.
Are you running the iotune using our AMI? on a raid0 of all disks? (IIUC what you wrote in the PR for i7i it seems that you don't...)
@syuu1228 I'm re-reading your original comment and I'm bit confused. You wrote:
I just ran iotune on i7ie.2xlarge again to make sure current parameter is correct, result was: disks:
- mountpoint: /var/lib/scylla read_iops: 117258 read_bandwidth: 1150152192 write_iops: 94138 write_bandwidth: 508599840 Seems like result is unchanged, it almost same performance as i7ie.xlarge.
These results don't seem to match Amazon's documentation, but isn't the parameters that Amazon's documentation indicates are the sum of all drives? If it correct, probably it seems to match our measurements.
The answer is that Amazon's documentation do indicates the sum of all drives. But I don't understand the second part, where you write that in match our measurements. How? The measurement you show above for i7i2.2xlarge is half of what they publish: read_iops: 117K vs 216K write_iops: 94K vs 173K.
Are you running the iotune using our AMI? on a raid0 of all disks? (IIUC what you wrote in the PR for i7i it seems that you don't...)
@roydahan My description was not good, I mean we measure on single disk performance, and then our setup code will mulitiply these parameters by number of disks.
for p in ["read_iops", "read_bandwidth", "write_iops", "write_bandwidth"]:
self.disk_properties[p] = io_params[t][p] * nr_disks
self.save()
https://github.com/scylladb/scylla-machine-image/blob/next/common/scylla_cloud_io_setup#L58
On i7ie.2xlarge, read_iops = 117258 * 2 = 23516, write_iops = 94138 * 2 = 188276. So it will be read_iops: 235k vs 216k, write_iops: 188k vs 173k.
Measurement environment is Ubuntu 24.04 which we use for the base image, since our image automatically constructs RAID0, it's easier to measure single drive performance using the image.
I mean we measure on single disk performance, and then our setup code will mulitiply these parameters by number of disks.
for p in ["read_iops", "read_bandwidth", "write_iops", "write_bandwidth"]: self.disk_properties[p] = io_params[t][p] * nr_disks self.save()https://github.com/scylladb/scylla-machine-image/blob/next/common/scylla_cloud_io_setup#L58
On i7ie.2xlarge, read_iops = 117258 * 2 = 23516, write_iops = 94138 * 2 = 188276. So it will be read_iops: 235k vs 216k, write_iops: 188k vs 173k.
Here are the measurement results for a single drive, the measurement results for a single drive multiplied by the number of disks, and a comparison table with AWS specifications. Write IOPS appears to be slower than AWS specifications for 3xlarge and larger sizes. This is because the actual iotune measurement values for a single drive were slower than the specifications.
single drive performance
| instance_type | read_iops | write_iops | nr_disks |
|---|---|---|---|
| i7ie.large | 58449 | 47145 | 1 |
| i7ie.xlarge | 117257 | 94180 | 1 |
| i7ie.2xlarge | 117267 | 94168 | 2 |
| i7ie.3xlarge | 352834 | 119327 | 1 |
| i7ie.6xlarge | 352824 | 119265 | 2 |
| i7ie.12xlarge | 352832 | 118937 | 4 |
| i7ie.18xlarge | 352818 | 119539 | 6 |
| i7ie.24xlarge | 352827 | 119278 | 8 |
| i7ie.48xlarge | 352826 | 119438 | 16 |
single drive performance * nr_disks
| instance_type | read_iops | write_iops | AWS_read_iops[1] | AWS_write_iops[1] |
|---|---|---|---|---|
| i7ie.large | 58449 | 47145 | 54166 | 43333 |
| i7ie.xlarge | 117257 | 94180 | 108333 | 86666 |
| i7ie.2xlarge | 234534 | 188336 | 216666 | 173332 |
| i7ie.3xlarge | 352834 | 119327 | 325000 | 260000 |
| i7ie.6xlarge | 705648 | 238530 | 650000 | 520000 |
| i7ie.12xlarge | 1411328 | 475748 | 1300000 | 1040000 |
| i7ie.18xlarge | 2116908 | 717234 | 1950000 | 1560000 |
| i7ie.24xlarge | 2822616 | 954224 | 2600000 | 2080000 |
| i7ie.48xlarge | 5645216 | 1911008 | 5200000 | 4160000 |
*[1] From AWS specs.
I edited the tables above a bit, just to see the numbers side by side. It looks like our write performance can't keep up with what the node can provide, from i7ie.3xlarge and above! @xemul , @avikivity - thoughts?
@syuu1228 where are the results of the "raid0" runs (not single disk). let's put them also side by side to this table.
In addition, suggesting the following edit to table to make it more readable:
single drive performance * nr_disks
| instance_type | read_iops | write_iops | AWS_read_iops[1] | AWS_write_iops[1] |
|---|---|---|---|---|
| i7ie.large | 58K | 47K | 54K | 43K |
| i7ie.xlarge | 117K | 94K | 108K | 86K |
| i7ie.2xlarge | 235K | 188K | 217K | 173K |
| i7ie.3xlarge | 353K | 119K | 325K | 260K |
| i7ie.6xlarge | 706K | 239K | 650K | 520K |
| i7ie.12xlarge | 1411K | 476K | 1300K | 1040K |
| i7ie.18xlarge | 2117K | 717K | 1950K | 1560K |
| i7ie.24xlarge | 2823K | 954K | 2600K | 2080K |
| i7ie.48xlarge | 5645K | 1911K | 5200K | 4160K |
Or better:
single drive performance * nr_disks
| instance_type | read_iops | AWS_read_iops[1] | read_delta | write_iops | AWS_write_iops[1] | write_delta |
|---|---|---|---|---|---|---|
| i7ie.large | 58K | 54K | +7% | 47K | 43K | +9% |
| i7ie.xlarge | 117K | 108K | +8% | 94K | 86K | +9% |
| i7ie.2xlarge | 235K | 217K | +8% | 188K | 173K | +9% |
| i7ie.3xlarge | 353K | 325K | +9% | 119K | 260K | -54% |
| i7ie.6xlarge | 706K | 650K | +9% | 239K | 520K | -54% |
| i7ie.12xlarge | 1411K | 1300K | +9% | 476K | 1040K | -54% |
| i7ie.18xlarge | 2117K | 1950K | +9% | 717K | 1560K | -54% |
| i7ie.24xlarge | 2823K | 2600K | +9% | 954K | 2080K | -54% |
| i7ie.48xlarge | 5645K | 5200K | +9% | 1911K | 4160K | -54% |
@syuu1228 where are the results of the "raid0" runs (not single disk). let's put them also side by side to this table.
Here's additional measurement iotune results, measured on RAID0 volume instead of single drive: (tested with RAID0, running on Scylla 2025.3.0-dev AMI)
Due to https://github.com/scylladb/scylla-machine-image/issues/723, i7ie.48xlarge only able to run benchmark 2 times, not 3 times.
| instance_type | read_iops | read_bandwidth | write_iops | write_bandwidth |
|---|---|---|---|---|
| i7ie.large.0 | 58426 | 574715008 | 46887 | 251620544 |
| i7ie.large.1 | 62314 | 574897088 | 47047 | 246629632 |
| i7ie.large.2 | 58437 | 574944960 | 47069 | 247261568 |
| i7ie.xlarge.0 | 117320 | 1148520576 | 94264 | 500478400 |
| i7ie.xlarge.1 | 117225 | 1149501952 | 94215 | 503726432 |
| i7ie.xlarge.2 | 117216 | 1149855872 | 94253 | 502346432 |
| i7ie.2xlarge.0 | 234822 | 2288215808 | 188731 | 1012827200 |
| i7ie.2xlarge.1 | 234855 | 2288272896 | 188179 | 1011206592 |
| i7ie.2xlarge.2 | 234845 | 2288216576 | 188758 | 1007555712 |
| i7ie.3xlarge.0 | 352780 | 3440667648 | 195347 | 1531361024 |
| i7ie.3xlarge.1 | 352777 | 3440119808 | 195994 | 1531312384 |
| i7ie.3xlarge.2 | 352782 | 3438665984 | 195243 | 1531802240 |
| i7ie.6xlarge.0 | 525023 | 6747701760 | 352139 | 3050450688 |
| i7ie.6xlarge.1 | 522884 | 6748208640 | 353432 | 3050736640 |
| i7ie.6xlarge.2 | 525638 | 6747821568 | 358519 | 3050702592 |
| i7ie.12xlarge.0 | 952244 | 8574459904 | 650959 | 6103974400 |
| i7ie.12xlarge.1 | 951841 | 8574929920 | 651901 | 6104414208 |
| i7ie.12xlarge.2 | 949660 | 8572749312 | 651515 | 6104256000 |
| i7ie.18xlarge.0 | 1353136 | 8571264512 | 862961 | 8553358336 |
| i7ie.18xlarge.1 | 1362974 | 8577720832 | 883500 | 8578407424 |
| i7ie.18xlarge.2 | 1353274 | 8567787008 | 859629 | 8565970432 |
| i7ie.24xlarge.0 | 1767879 | 8578226176 | 1039556 | 8578760192 |
| i7ie.24xlarge.1 | 1770136 | 8579100672 | 1045217 | 8578906112 |
| i7ie.24xlarge.2 | 1768215 | 8578795008 | 1041029 | 8579819520 |
| i7ie.48xlarge.0 | 3383705 | 8477860352 | 2044940 | 8475563520 |
| i7ie.48xlarge.1 | 3338124 | 8437319168 | 2032106 | 8404414464 |
| i7ie.metal-24xl.0 | 1790041 | 8577441280 | 1048330 | 8559525888 |
| i7ie.metal-24xl.1 | 1793329 | 8575400960 | 1042127 | 8570143232 |
| i7ie.metal-24xl.2 | 1798110 | 8570591744 | 1039189 | 8576837632 |
| i7ie.metal-48xl.0 | 3510983 | 8540174336 | 2017272 | 8541017600 |
| i7ie.metal-48xl.1 | 3517379 | 8545589760 | 2016191 | 8545735680 |
| i7ie.metal-48xl.2 | 3517068 | 8551666176 | 2036124 | 8549539840 |
| i7ie.large avg | 59725 | 574852352 | 47001 | 248503914 |
| i7ie.xlarge avg | 117253 | 1149292800 | 94244 | 502183754 |
| i7ie.2xlarge avg | 234840 | 2288235093 | 188556 | 1010529834 |
| i7ie.3xlarge avg | 352779 | 3439817813 | 195528 | 1531491882 |
| i7ie.6xlarge avg | 524515 | 6747910656 | 354696 | 3050629973 |
| i7ie.12xlarge avg | 951248 | 8574046378 | 651458 | 6104214869 |
| i7ie.18xlarge avg | 1356461 | 8572257450 | 868696 | 8565912064 |
| i7ie.24xlarge avg | 1768743 | 8578707285 | 1041934 | 8579161941 |
| i7ie.48xlarge avg | 3360914 | 8457589760 | 2038523 | 8439988992 |
| i7ie.metal-24xl avg | 1793826 | 8574477994 | 1043215 | 8568835584 |
| i7ie.metal-48xl avg | 3515143 | 8545810090 | 2023195 | 8545431040 |
The results above (https://github.com/scylladb/scylla-machine-image/issues/608#issuecomment-2956390772 ) show exactly what I had hoped to see. What prevents us from fixing this issue?
@roydahan @yaronkaikov does it make sense to fix this bug together with this PR ?
Yes, this one is going to be fixed as part of your PR, just mention it in your commit and it will be closed.
Closing this issue as it was moved to Jira. Please continue the thread in https://scylladb.atlassian.net/browse/SMI-178