Investigating TPCC perf

Open max-hoffman opened this issue 2 years ago • 0 comments

Some low hanging fruit

Slow creating new NOW function: https://github.com/dolthub/go-mysql-server/blob/6744a0d54c8f4611177f5de0a64d29603139ef7f/sql/expression/function/time.go#L845

Slow executing NOW function (specifically time.ConvertTimeZone)

Comparison Eval could be faster:

The sorts in these queries are unnecessary:

sbt> explain SELECT c_id FROM customer1 WHERE c_w_id = 1 AND c_d_id= 5 AND c_last='ESEEINGABLE' ORDER BY c_first;
+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| plan                                                                                                                                                                                                                          |
+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Project                                                                                                                                                                                                                       |
|  ├─ columns: [customer1.c_id]                                                                                                                                                                                                 |
|  └─ Sort(customer1.c_first ASC)                                                                                                                                                                                               |
|      └─ Filter                                                                                                                                                                                                                |
|          ├─ (customer1.c_last = 'ESEEINGABLE')                                                                                                                                                                                |
|          └─ IndexedTableAccess(customer1)                                                                                                                                                                                     |
|              ├─ index: [customer1.c_w_id,customer1.c_d_id,customer1.c_id]                                                                                                                                                     |
|              ├─ filters: [{[1, 1], [5, 5], [NULL, ∞)}]                                                                                                                                                                        |
|              └─ columns: [c_id c_d_id c_w_id c_first c_middle c_last c_street_1 c_street_2 c_city c_state c_zip c_phone c_since c_credit c_credit_lim c_discount c_balance c_ytd_payment c_payment_cnt c_delivery_cnt c_data] |
+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
9 rows in set (0.00 sec)

sbt> explain SELECT c_balance, c_first, c_middle, c_id FROM customer2 WHERE c_w_id = 1 AND c_d_id= 1 AND c_last='PRIESEPRES' ORDER BY c_first;
+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| plan                                                                                                                                                                                                                          |
+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Project                                                                                                                                                                                                                       |
|  ├─ columns: [customer2.c_balance, customer2.c_first, customer2.c_middle, customer2.c_id]                                                                                                                                     |
|  └─ Sort(customer2.c_first ASC)                                                                                                                                                                                               |
|      └─ Filter                                                                                                                                                                                                                |
|          ├─ (customer2.c_last = 'PRIESEPRES')                                                                                                                                                                                 |
|          └─ IndexedTableAccess(customer2)                                                                                                                                                                                     |
|              ├─ index: [customer2.c_w_id,customer2.c_d_id,customer2.c_id]                                                                                                                                                     |
|              ├─ filters: [{[1, 1], [1, 1], [NULL, ∞)}]                                                                                                                                                                        |
|              └─ columns: [c_id c_d_id c_w_id c_first c_middle c_last c_street_1 c_street_2 c_city c_state c_zip c_phone c_since c_credit c_credit_lim c_discount c_balance c_ytd_payment c_payment_cnt c_delivery_cnt c_data] |
+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
9 rows in set (0.00 sec)

sbt> explain SELECT o_id, o_carrier_id, o_entry_d FROM orders2 WHERE o_w_id = 1 AND o_d_id = 1 AND o_c_id = 355 ORDER BY o_id DESC;
+--------------------------------------------------------------------------------------------------+
| plan                                                                                             |
+--------------------------------------------------------------------------------------------------+
| Project                                                                                          |
|  ├─ columns: [orders2.o_id, orders2.o_carrier_id, orders2.o_entry_d]                             |
|  └─ Sort(orders2.o_id DESC)                                                                      |
|      └─ Filter                                                                                   |
|          ├─ (orders2.o_c_id = 355)                                                               |
|          └─ IndexedTableAccess(orders2)                                                          |
|              ├─ index: [orders2.o_w_id,orders2.o_d_id,orders2.o_id]                              |
|              ├─ filters: [{[1, 1], [1, 1], [NULL, ∞)}]                                           |
|              └─ columns: [o_id o_d_id o_w_id o_c_id o_entry_d o_carrier_id o_ol_cnt o_all_local] |
+--------------------------------------------------------------------------------------------------+

The blob type should be pruned, deserializing all of the columns is expensive:

sbt> explain SELECT c_id FROM customer1 WHERE c_w_id = 1 AND c_d_id= 5 AND c_last='ESEEINGABLE' ORDER BY c_first;
+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| plan                                                                                                                                                                                                                          |
+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Project                                                                                                                                                                                                                       |
|  ├─ columns: [customer1.c_id]                                                                                                                                                                                                 |
|  └─ Sort(customer1.c_first ASC)                                                                                                                                                                                               |
|      └─ Filter                                                                                                                                                                                                                |
|          ├─ (customer1.c_last = 'ESEEINGABLE')                                                                                                                                                                                |
|          └─ IndexedTableAccess(customer1)                                                                                                                                                                                     |
|              ├─ index: [customer1.c_w_id,customer1.c_d_id,customer1.c_id]                                                                                                                                                     |
|              ├─ filters: [{[1, 1], [5, 5], [NULL, ∞)}]                                                                                                                                                                        |
|              └─ columns: [c_id c_d_id c_w_id c_first c_middle c_last c_street_1 c_street_2 c_city c_state c_zip c_phone c_since c_credit c_credit_lim c_discount c_balance c_ytd_payment c_payment_cnt c_delivery_cnt c_data] |
+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

Same here:

		Query: `SELECT c_balance, c_first, c_middle, c_id FROM customer2 WHERE c_w_id = 1 AND c_d_id= 1 AND c_last='PRIESEPRES' ORDER BY c_first`,
		ExpectedPlan: "Project\n" +
			" ├─ columns: [customer2.c_balance:16, customer2.c_first:3, customer2.c_middle:4, customer2.c_id:0!null]\n" +
			" └─ Sort(customer2.c_first:3 ASC nullsFirst)\n" +
			"     └─ Filter\n" +
			"         ├─ Eq\n" +
			"         │   ├─ customer2.c_last:5\n" +
			"         │   └─ PRIESEPRES (longtext)\n" +
			"         └─ IndexedTableAccess(customer2)\n" +
			"             ├─ index: [customer2.c_w_id,customer2.c_d_id,customer2.c_id]\n" +
			"             ├─ static: [{[1, 1], [1, 1], [NULL, ∞)}]\n" +
			"             └─ Table\n" +
			"                 ├─ name: customer2\n" +
			"                 └─ columns: [c_id c_d_id c_w_id c_first c_middle c_last c_street_1 c_street_2 c_city c_state c_zip c_phone c_since c_credit c_credit_lim c_discount c_balance c_ytd_payment c_payment_cnt c_delivery_cnt c_data]\n" +
			"",

Is it possible to avoid round tripping the blob type for an update?

		Query: `UPDATE customer1 SET c_balance=-1777.000000, c_ytd_payment=1777.000000 WHERE c_w_id = 1 AND c_d_id=5 AND c_id=1838`,
		ExpectedPlan: "RowUpdateAccumulator\n" +
			" └─ Update\n" +
			"     └─ UpdateSource(SET customer1.c_balance:16 = -1777.000000,SET customer1.c_ytd_payment:17 = 1777 (decimal(10,6)))\n" +
			"         └─ IndexedTableAccess(customer1)\n" +
			"             ├─ index: [customer1.c_w_id,customer1.c_d_id,customer1.c_id]\n" +
			"             ├─ static: [{[1, 1], [5, 5], [1838, 1838]}]\n" +
			"             └─ Table\n" +
			"                 ├─ name: customer1\n" +
			"                 └─ columns: [c_id c_d_id c_w_id c_first c_middle c_last c_street_1 c_street_2 c_city c_state c_zip c_phone c_since c_credit c_credit_lim c_discount c_balance c_ytd_payment c_payment_cnt c_delivery_cnt c_data]\n" +
			"",

Deeper bugs

TPCC doesn't benefit from the same caching as individual sysbench queries:

sbt> SELECT COUNT(DISTINCT (s_i_id)) FROM order_line3, stock3 WHERE ol_w_id = 1 AND ol_d_id = 5 AND ol_o_id < 3003 AND ol_o_id >= 2983 AND s_w_id= 1 AND s_i_id=ol_i_id AND s_quantity < 18;
+--------------------------+
| COUNT(DISTINCT (s_i_id)) |
+--------------------------+
| 24                       |
+--------------------------+
1 row in set (1.20 sec)

sbt> SELECT COUNT(DISTINCT (s_i_id)) FROM order_line3, stock3 WHERE ol_w_id = 1 AND ol_d_id = 5 AND ol_o_id < 3003 AND ol_o_id >= 2983 AND s_w_id= 1 AND s_i_id=ol_i_id AND s_quantity < 18;
+--------------------------+
| COUNT(DISTINCT (s_i_id)) |
+--------------------------+
| 24                       |
+--------------------------+
1 row in set (0.08 sec)

This query has a particular optimization where the DISTINCT can be transformed into an ORDERED DISTINCT on the order_line3 table (this one is is an edge case, maybe taking a 10ms query to 5ms at best):

sbt> explain SELECT COUNT(DISTINCT (s_i_id)) FROM order_line3, stock3 WHERE ol_w_id = 1 AND ol_d_id = 5 AND ol_o_id < 3003 AND ol_o_id >= 2983 AND s_w_id= 1 AND s_i_id=ol_i_id AND s_quantity < 18;
+------------------------------------------------------------------------------------------------------------+
| plan                                                                                                       |
+------------------------------------------------------------------------------------------------------------+
| Project                                                                                                    |
|  ├─ columns: [countdistinct([stock3.s_i_id])]                                                              |
|  └─ GroupBy                                                                                                |
|      ├─ SelectedExprs(COUNTDISTINCT([stock3.s_i_id]))                                                      |
|      ├─ Grouping()                                                                                         |
|      └─ LookupJoin                                                                                         |
|          ├─ IndexedTableAccess(order_line3)                                                                |
|          │   ├─ index: [order_line3.ol_w_id,order_line3.ol_d_id,order_line3.ol_o_id,order_line3.ol_number] |
|          │   ├─ filters: [{[1, 1], [5, 5], [2983, 3003), [NULL, ∞)}]                                       |
|          │   └─ columns: [ol_o_id ol_d_id ol_w_id ol_i_id]                                                 |
|          └─ Filter                                                                                         |
|              ├─ ((stock3.s_w_id = 1) AND (stock3.s_quantity < 18))                                         |
|              └─ IndexedTableAccess(stock3)                                                                 |
|                  ├─ index: [stock3.s_w_id,stock3.s_i_id]                                                   |
|                  ├─ columns: [s_i_id s_w_id s_quantity]                                                    |
|                  └─ keys: 1, order_line3.ol_i_id                                                           |
+------------------------------------------------------------------------------------------------------------+

The sequence of events is:

Infer DISTINCT(stock3.s_i_id) ~ DISTINCT(order_line3.ol_i_id)
Push DISTINCT(order_line3.ol_i_id) into LHS of join.
Notice that the ITA(order_line3) will return rows sorted by order_line3.ol_i_id, so the DISTINCT can be executed in O(1) memory.

doCommit is slow, optimistic lock failures scale super linearly to concurrency:

We scale poorly with multithread, --time=60 --threads={threads} --tables=4 --scale=1 --db-driver=mysql run on macbook pro: TPCC scaling

We are still picking a slow HASH_JOIN when a LOOKUP_JOIN is nearly instant (costing will fix this):

Nov 09 '23 05:11 max-hoffman