Investigating TPCC perf
Some low hanging fruit
Slow creating new NOW function: https://github.com/dolthub/go-mysql-server/blob/6744a0d54c8f4611177f5de0a64d29603139ef7f/sql/expression/function/time.go#L845
Slow executing NOW function (specifically time.ConvertTimeZone)
Comparison Eval could be faster:
The sorts in these queries are unnecessary:
sbt> explain SELECT c_id FROM customer1 WHERE c_w_id = 1 AND c_d_id= 5 AND c_last='ESEEINGABLE' ORDER BY c_first;
+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| plan |
+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Project |
| ├─ columns: [customer1.c_id] |
| └─ Sort(customer1.c_first ASC) |
| └─ Filter |
| ├─ (customer1.c_last = 'ESEEINGABLE') |
| └─ IndexedTableAccess(customer1) |
| ├─ index: [customer1.c_w_id,customer1.c_d_id,customer1.c_id] |
| ├─ filters: [{[1, 1], [5, 5], [NULL, ∞)}] |
| └─ columns: [c_id c_d_id c_w_id c_first c_middle c_last c_street_1 c_street_2 c_city c_state c_zip c_phone c_since c_credit c_credit_lim c_discount c_balance c_ytd_payment c_payment_cnt c_delivery_cnt c_data] |
+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
9 rows in set (0.00 sec)
sbt> explain SELECT c_balance, c_first, c_middle, c_id FROM customer2 WHERE c_w_id = 1 AND c_d_id= 1 AND c_last='PRIESEPRES' ORDER BY c_first;
+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| plan |
+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Project |
| ├─ columns: [customer2.c_balance, customer2.c_first, customer2.c_middle, customer2.c_id] |
| └─ Sort(customer2.c_first ASC) |
| └─ Filter |
| ├─ (customer2.c_last = 'PRIESEPRES') |
| └─ IndexedTableAccess(customer2) |
| ├─ index: [customer2.c_w_id,customer2.c_d_id,customer2.c_id] |
| ├─ filters: [{[1, 1], [1, 1], [NULL, ∞)}] |
| └─ columns: [c_id c_d_id c_w_id c_first c_middle c_last c_street_1 c_street_2 c_city c_state c_zip c_phone c_since c_credit c_credit_lim c_discount c_balance c_ytd_payment c_payment_cnt c_delivery_cnt c_data] |
+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
9 rows in set (0.00 sec)
sbt> explain SELECT o_id, o_carrier_id, o_entry_d FROM orders2 WHERE o_w_id = 1 AND o_d_id = 1 AND o_c_id = 355 ORDER BY o_id DESC;
+--------------------------------------------------------------------------------------------------+
| plan |
+--------------------------------------------------------------------------------------------------+
| Project |
| ├─ columns: [orders2.o_id, orders2.o_carrier_id, orders2.o_entry_d] |
| └─ Sort(orders2.o_id DESC) |
| └─ Filter |
| ├─ (orders2.o_c_id = 355) |
| └─ IndexedTableAccess(orders2) |
| ├─ index: [orders2.o_w_id,orders2.o_d_id,orders2.o_id] |
| ├─ filters: [{[1, 1], [1, 1], [NULL, ∞)}] |
| └─ columns: [o_id o_d_id o_w_id o_c_id o_entry_d o_carrier_id o_ol_cnt o_all_local] |
+--------------------------------------------------------------------------------------------------+
The blob type should be pruned, deserializing all of the columns is expensive:
sbt> explain SELECT c_id FROM customer1 WHERE c_w_id = 1 AND c_d_id= 5 AND c_last='ESEEINGABLE' ORDER BY c_first;
+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| plan |
+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Project |
| ├─ columns: [customer1.c_id] |
| └─ Sort(customer1.c_first ASC) |
| └─ Filter |
| ├─ (customer1.c_last = 'ESEEINGABLE') |
| └─ IndexedTableAccess(customer1) |
| ├─ index: [customer1.c_w_id,customer1.c_d_id,customer1.c_id] |
| ├─ filters: [{[1, 1], [5, 5], [NULL, ∞)}] |
| └─ columns: [c_id c_d_id c_w_id c_first c_middle c_last c_street_1 c_street_2 c_city c_state c_zip c_phone c_since c_credit c_credit_lim c_discount c_balance c_ytd_payment c_payment_cnt c_delivery_cnt c_data] |
+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
Same here:
Query: `SELECT c_balance, c_first, c_middle, c_id FROM customer2 WHERE c_w_id = 1 AND c_d_id= 1 AND c_last='PRIESEPRES' ORDER BY c_first`,
ExpectedPlan: "Project\n" +
" ├─ columns: [customer2.c_balance:16, customer2.c_first:3, customer2.c_middle:4, customer2.c_id:0!null]\n" +
" └─ Sort(customer2.c_first:3 ASC nullsFirst)\n" +
" └─ Filter\n" +
" ├─ Eq\n" +
" │ ├─ customer2.c_last:5\n" +
" │ └─ PRIESEPRES (longtext)\n" +
" └─ IndexedTableAccess(customer2)\n" +
" ├─ index: [customer2.c_w_id,customer2.c_d_id,customer2.c_id]\n" +
" ├─ static: [{[1, 1], [1, 1], [NULL, ∞)}]\n" +
" └─ Table\n" +
" ├─ name: customer2\n" +
" └─ columns: [c_id c_d_id c_w_id c_first c_middle c_last c_street_1 c_street_2 c_city c_state c_zip c_phone c_since c_credit c_credit_lim c_discount c_balance c_ytd_payment c_payment_cnt c_delivery_cnt c_data]\n" +
"",
Is it possible to avoid round tripping the blob type for an update?
Query: `UPDATE customer1 SET c_balance=-1777.000000, c_ytd_payment=1777.000000 WHERE c_w_id = 1 AND c_d_id=5 AND c_id=1838`,
ExpectedPlan: "RowUpdateAccumulator\n" +
" └─ Update\n" +
" └─ UpdateSource(SET customer1.c_balance:16 = -1777.000000,SET customer1.c_ytd_payment:17 = 1777 (decimal(10,6)))\n" +
" └─ IndexedTableAccess(customer1)\n" +
" ├─ index: [customer1.c_w_id,customer1.c_d_id,customer1.c_id]\n" +
" ├─ static: [{[1, 1], [5, 5], [1838, 1838]}]\n" +
" └─ Table\n" +
" ├─ name: customer1\n" +
" └─ columns: [c_id c_d_id c_w_id c_first c_middle c_last c_street_1 c_street_2 c_city c_state c_zip c_phone c_since c_credit c_credit_lim c_discount c_balance c_ytd_payment c_payment_cnt c_delivery_cnt c_data]\n" +
"",
Deeper bugs
TPCC doesn't benefit from the same caching as individual sysbench queries:
sbt> SELECT COUNT(DISTINCT (s_i_id)) FROM order_line3, stock3 WHERE ol_w_id = 1 AND ol_d_id = 5 AND ol_o_id < 3003 AND ol_o_id >= 2983 AND s_w_id= 1 AND s_i_id=ol_i_id AND s_quantity < 18;
+--------------------------+
| COUNT(DISTINCT (s_i_id)) |
+--------------------------+
| 24 |
+--------------------------+
1 row in set (1.20 sec)
sbt> SELECT COUNT(DISTINCT (s_i_id)) FROM order_line3, stock3 WHERE ol_w_id = 1 AND ol_d_id = 5 AND ol_o_id < 3003 AND ol_o_id >= 2983 AND s_w_id= 1 AND s_i_id=ol_i_id AND s_quantity < 18;
+--------------------------+
| COUNT(DISTINCT (s_i_id)) |
+--------------------------+
| 24 |
+--------------------------+
1 row in set (0.08 sec)
This query has a particular optimization where the DISTINCT can be transformed into an ORDERED DISTINCT on the order_line3 table (this one is is an edge case, maybe taking a 10ms query to 5ms at best):
sbt> explain SELECT COUNT(DISTINCT (s_i_id)) FROM order_line3, stock3 WHERE ol_w_id = 1 AND ol_d_id = 5 AND ol_o_id < 3003 AND ol_o_id >= 2983 AND s_w_id= 1 AND s_i_id=ol_i_id AND s_quantity < 18;
+------------------------------------------------------------------------------------------------------------+
| plan |
+------------------------------------------------------------------------------------------------------------+
| Project |
| ├─ columns: [countdistinct([stock3.s_i_id])] |
| └─ GroupBy |
| ├─ SelectedExprs(COUNTDISTINCT([stock3.s_i_id])) |
| ├─ Grouping() |
| └─ LookupJoin |
| ├─ IndexedTableAccess(order_line3) |
| │ ├─ index: [order_line3.ol_w_id,order_line3.ol_d_id,order_line3.ol_o_id,order_line3.ol_number] |
| │ ├─ filters: [{[1, 1], [5, 5], [2983, 3003), [NULL, ∞)}] |
| │ └─ columns: [ol_o_id ol_d_id ol_w_id ol_i_id] |
| └─ Filter |
| ├─ ((stock3.s_w_id = 1) AND (stock3.s_quantity < 18)) |
| └─ IndexedTableAccess(stock3) |
| ├─ index: [stock3.s_w_id,stock3.s_i_id] |
| ├─ columns: [s_i_id s_w_id s_quantity] |
| └─ keys: 1, order_line3.ol_i_id |
+------------------------------------------------------------------------------------------------------------+
The sequence of events is:
- Infer
DISTINCT(stock3.s_i_id)~DISTINCT(order_line3.ol_i_id) - Push
DISTINCT(order_line3.ol_i_id)into LHS of join. - Notice that the
ITA(order_line3)will return rows sorted byorder_line3.ol_i_id, so the DISTINCT can be executed in O(1) memory.
doCommit is slow, optimistic lock failures scale super linearly to concurrency:
We scale poorly with multithread, --time=60 --threads={threads} --tables=4 --scale=1 --db-driver=mysql run on macbook pro:
We are still picking a slow HASH_JOIN when a LOOKUP_JOIN is nearly instant (costing will fix this):