[Bug]: sysbench/tpcc perf continuously decrease duaring oltp high concurrency test.
Is there an existing issue for the same bug?
- [X] I have checked the existing issues.
Environment
- Version or commit-id (e.g. v0.1.0 or 8b23a93):27c0e5e2ada81d945d858e2f57c04c7568268afb
- Hardware parameters:
- OS type:
- Others:
Actual Behavior
1、insert/update perf continuously decrease duaring oltp high concurrency test.
2、the perf of point_select diff very much on different time for the same mo. eg.
first:
second:
and between the time , some insert and update test had been executed.
please assign this issue to @XuPeng-SH
Expected Behavior
No response
Steps to Reproduce
No response
Additional information
No response
@aptend storage optimization with better merge policy
Progress:
env: AMD EPYC 7K83 64-Core Processor 2.5GHz x 16 + 64G
baseline: 6c0e0c5ee89b16b36f547aee6aaa8f6e2341432c merge-1: https://github.com/aptend/matrixone/commit/6c78e28a5a61ff396d4cab3f989b9166130c4397 merge-2: merge-1 + merge compacted blocks in nonappendable segments.
| process(vuser=100, 10min) | baseline | merge-1 | merge-2 |
|---|---|---|---|
| select-10-100000-prepare | 13116 -> 12066 | 13760 -> 12340 | 14680 -> 12422 |
| update--10-100000-prepare | 523 -> 1053 -> 729 | 680 -> 1128 -> 789 | 647 -> 1093 -> 819 |
| insert-10-100000-prepare | 5213 -> 3968 | 5200 -> 4100 | 6242 -> 4171 |
| select-10-100000-prepare | 1789 -> 4188 | 5929 -> 7249 | 5628 -> 7320 ⭐️ |
| update-10-100000-prepare | 342 -> 720 -> 555 | 454 -> 825 -> 624 | 421 -> 763 -> 667 |
| select-10-100000-prepare | 2000 -> 3169 | 2856 -> 4017 | 3044 -> 5391 ⭐️ |
| idle(15min) | |||
| select-10-100000-prepare | 3956 - 4000 | 4399 - 4458 | 5888 - 5988 ⭐️ |
env: AMD EPYC 7K83 64-Core Processor 2.5GHz x 16 + 64G
baseline: 6a65b65cecb30816d3565b2551308ab79b6c27cc merge-1: edee1e1c2dd75841858034eb5a8262126ac5ae68
| process(vuser=100,10min) | baseline | merge-1 |
|---|---|---|
| select-10-1000000-prepare | 8000 -> 8690 | 10129 -> 8700 |
| update-10-1000000-prepare | 300 -> 858 -> 617 | 451 -> 929 -> 654 |
| insert-10-1000000-prepare | 4000 -> 1718 | 4777 -> 3557 ⭐️ |
| select-10-1000000-prepare | 1200 -> 5380 ⭐️ | 2222 -> 4568 |
| update-10-1000000-prepare | 300 -> 600 -> 486 | 371 -> 548 -> 461 |
| select-10-1000000-prepare | 1600 -> 4721 ⭐️ | 1815 -> 3353 |
| Note: | 30 merges | 430 merges |
Only insert benefits from constant merging, and the cost of merging is a concern...
for phenomena 2 the perf of point_select diff very much on different time for the same mo
One possible cause is that Update will create much more blocks, making it longer to iterate through all blocks. I will trace the getBlockInfos method
after one round of updating, the number of blocks will become 1000+, making BlockIter cost increase from 100us to 1ms
I've given thee courtesy enough -- Hoarah Loux
设置:
- 去掉写入logservice过程
- 20并发
- 11th Gen Intel(R) Core(TM) i7-11700 @ 2.50GHz x 16 + 16 G
- 下图中左 2800 tps,右 4000 tps。
exec.Run 占比小,基本不变。
Commit 在去掉 flush 日志后,依然存在长尾波动,同时和 Compile 长尾更严重
处理稳定性问题中
在增加commit全流程耗时跟踪
去除掉 logservice 写入后,目前看到的主要影响来自于 rpc 未及时 cancel 掉 message 中的 context,导致超时一瞬间 goroutine 太多,其它 goroutine 调度不及时。正在修改和测试
现在 insert 性能下降得到了较大改善
悲观事务debug和flush、merge优化中
处理关联的 https://github.com/matrixorigin/matrixone/issues/10527
添加 metric 中
重构merge相关
其中一个 case 是 #12775,sysbench delete 慢
单机测试 sysbench 100w 100 并发 15 min,未再出现性能下降
[ 10s ] thds: 100 tps: 4508.63 qps: 4508.63 ...
[ 20s ] thds: 100 tps: 4787.31 qps: 4787.31 ...
[ 30s ] thds: 100 tps: 4700.33 qps: 4700.33 ...
[ 40s ] thds: 100 tps: 4497.15 qps: 4497.15 ...
[ 50s ] thds: 100 tps: 4280.30 qps: 4280.30 ...
[ 60s ] thds: 100 tps: 4200.12 qps: 4200.12 ...
[ 70s ] thds: 100 tps: 4080.87 qps: 4080.87 ...
[ 80s ] thds: 100 tps: 4379.09 qps: 4379.09 ...
[ 90s ] thds: 100 tps: 4547.70 qps: 4547.70 ...
[ 100s ] thds: 100 tps: 4414.33 qps: 4414.33 ...
[ 110s ] thds: 100 tps: 4376.33 qps: 4376.33 ...
[ 120s ] thds: 100 tps: 4315.47 qps: 4315.47 ...
[ 130s ] thds: 100 tps: 4378.28 qps: 4378.28 ...
[ 140s ] thds: 100 tps: 4424.54 qps: 4424.54 ...
[ 150s ] thds: 100 tps: 4262.74 qps: 4262.74 ...
[ 160s ] thds: 100 tps: 4269.99 qps: 4269.99 ...
[ 170s ] thds: 100 tps: 4432.43 qps: 4432.43 ...
[ 180s ] thds: 100 tps: 4394.48 qps: 4394.48 ...
[ 190s ] thds: 100 tps: 4272.31 qps: 4272.31 ...
[ 200s ] thds: 100 tps: 4401.83 qps: 4401.83 ...
[ 210s ] thds: 100 tps: 4382.80 qps: 4382.80 ...
[ 220s ] thds: 100 tps: 4616.85 qps: 4616.85 ...
[ 230s ] thds: 100 tps: 4616.98 qps: 4616.98 ...
[ 240s ] thds: 100 tps: 4535.14 qps: 4535.14 ...
[ 250s ] thds: 100 tps: 4372.20 qps: 4372.20 ...
[ 260s ] thds: 100 tps: 4428.16 qps: 4428.16 ...
[ 270s ] thds: 100 tps: 4297.89 qps: 4297.89 ...
[ 280s ] thds: 100 tps: 4312.50 qps: 4312.50 ...
[ 290s ] thds: 100 tps: 4411.91 qps: 4411.91 ...
[ 300s ] thds: 100 tps: 5032.67 qps: 5032.67 ...
[ 310s ] thds: 100 tps: 5352.36 qps: 5352.36 ...
[ 320s ] thds: 100 tps: 5845.58 qps: 5845.58 ...
[ 330s ] thds: 100 tps: 5693.32 qps: 5693.32 ...
[ 340s ] thds: 100 tps: 5875.50 qps: 5875.50 ...
[ 350s ] thds: 100 tps: 5755.33 qps: 5755.33 ...
[ 360s ] thds: 100 tps: 5824.79 qps: 5824.79 ...
[ 370s ] thds: 100 tps: 5795.73 qps: 5795.73 ...
[ 380s ] thds: 100 tps: 5716.80 qps: 5716.80 ...
[ 390s ] thds: 100 tps: 5592.50 qps: 5592.50 ...
[ 400s ] thds: 100 tps: 5582.21 qps: 5582.21 ...
[ 410s ] thds: 100 tps: 5614.02 qps: 5614.02 ...
[ 420s ] thds: 100 tps: 5689.36 qps: 5689.36 ...
[ 430s ] thds: 100 tps: 5718.97 qps: 5718.97 ...
[ 440s ] thds: 100 tps: 5709.59 qps: 5709.59 ...
[ 450s ] thds: 100 tps: 5858.09 qps: 5858.09 ...
[ 460s ] thds: 100 tps: 5848.95 qps: 5848.95 ...
[ 470s ] thds: 100 tps: 5972.44 qps: 5972.44 ...
[ 480s ] thds: 100 tps: 5831.00 qps: 5831.00 ...
[ 490s ] thds: 100 tps: 5877.16 qps: 5877.16 ...
[ 500s ] thds: 100 tps: 5708.42 qps: 5708.42 ...
[ 510s ] thds: 100 tps: 5644.83 qps: 5644.83 ...
[ 520s ] thds: 100 tps: 5702.69 qps: 5702.69 ...
[ 530s ] thds: 100 tps: 5687.69 qps: 5687.69 ...
[ 540s ] thds: 100 tps: 5852.80 qps: 5852.80 ...
[ 550s ] thds: 100 tps: 5823.04 qps: 5823.04 ...
[ 560s ] thds: 100 tps: 5821.09 qps: 5821.09 ...
[ 570s ] thds: 100 tps: 5764.28 qps: 5764.28 ...
[ 580s ] thds: 100 tps: 5579.16 qps: 5579.16 ...
[ 590s ] thds: 100 tps: 5684.53 qps: 5684.53 ...
[ 600s ] thds: 100 tps: 5702.38 qps: 5702.38 ...
[ 610s ] thds: 100 tps: 5823.26 qps: 5823.26 ...
[ 620s ] thds: 100 tps: 5662.75 qps: 5662.75 ...
[ 630s ] thds: 100 tps: 5594.60 qps: 5594.60 ...
[ 640s ] thds: 100 tps: 5459.33 qps: 5459.33 ...
[ 650s ] thds: 100 tps: 5525.19 qps: 5525.19 ...
[ 660s ] thds: 100 tps: 5619.06 qps: 5619.06 ...
[ 670s ] thds: 100 tps: 5604.78 qps: 5604.78 ...
[ 680s ] thds: 100 tps: 5618.56 qps: 5618.56 ...
[ 690s ] thds: 100 tps: 5560.80 qps: 5560.80 ...
[ 700s ] thds: 100 tps: 5647.27 qps: 5647.27 ...
[ 710s ] thds: 100 tps: 5601.89 qps: 5601.89 ...
[ 720s ] thds: 100 tps: 5454.79 qps: 5454.79 ...
[ 730s ] thds: 100 tps: 5470.28 qps: 5470.28 ...
[ 740s ] thds: 100 tps: 5690.62 qps: 5690.62 ...
[ 750s ] thds: 100 tps: 5512.13 qps: 5512.13 ...
[ 760s ] thds: 100 tps: 5456.98 qps: 5456.98 ...
[ 770s ] thds: 100 tps: 5521.98 qps: 5521.98 ...
[ 780s ] thds: 100 tps: 5522.36 qps: 5522.36 ...
[ 790s ] thds: 100 tps: 5474.18 qps: 5474.18 ...
[ 800s ] thds: 100 tps: 5354.53 qps: 5354.53 ...
[ 810s ] thds: 100 tps: 5866.10 qps: 5866.10 ...
[ 820s ] thds: 100 tps: 5818.19 qps: 5818.19 ...
[ 830s ] thds: 100 tps: 5843.82 qps: 5843.82 ...
[ 840s ] thds: 100 tps: 5859.66 qps: 5859.66 ...
[ 850s ] thds: 100 tps: 5878.11 qps: 5878.11 ...
[ 860s ] thds: 100 tps: 5704.24 qps: 5704.24 ...
[ 870s ] thds: 100 tps: 5828.77 qps: 5828.77 ...
[ 880s ] thds: 100 tps: 5869.70 qps: 5869.70 ...
[ 890s ] thds: 100 tps: 5835.08 qps: 5835.08 ...
[ 900s ] thds: 100 tps: 5804.33 qps: 5804.33 ...
继续优化 reader deletes 的收集
reader deletes 修改已合入,待观察
这个需要长期的性能优化,推到1.2
一开始所有的block都没有deletes,随着update的增多,越来越多的block都会有delta loc,也就是tombstone, 那读数据就需要额外读tombstone,然后apply delete, 这也是开销越来越大的原因
在 128 机器上,memory cache 32g 时,tpcc 下降速度远远小于默认配置。后续可以尝试去掉io干扰后看下降点
三表改造和其他需求
其他ISSUE跟踪,该ISSUE关闭