[rowexec] custom rowexec
This PR adds custom Dolt execution operators for lookup joins. When building an execution plan, we try to replace joinIter with a Dolt equivalent that inlines the key building and map get. This is a lot faster because repeatedly building the secondary iterator and materializing sql.Rows in-between lookups are expensive. The cases where we can use map.Get instead of map.PrefixGet for strict key lookups will also perform fewer chunkstore reads.
This PR moves filters in join children to after materializing lookup join rows.
Barring any correctness issues I might be overlooking, ths prototype brings index_join from 5.18 ms/query to 2.64 ms/q, which will be 2.0x MySQL's latency.
Gaps before merging this:
- joins with ITAs on left still broken
- projection edge cases, non covering lookups. Will need to refactor a lot of the
LookupBuilderimplementations to exposeprolly.MapIterthan sql.RowIter - strict key vs non-unique lookups
- maybe a more organized/extendable scaffold before merging this
#benchmark
@max-hoffman workflow run: https://github.com/dolthub/dolt/actions/runs/9682864996
@max-hoffman DOLT
| test_name | from_latency_p95 | to_latency_p95 | is_faster |
|---|---|---|---|
| tpcc-scale-factor-1 | 73.13 | 89.16 | 0 |
| test_name | server_name | server_version | tps | test_name | server_name | server_version | tps | is_faster |
|---|---|---|---|---|---|---|---|---|
| tpcc-scale-factor-1 | dolt | f7abadf73e214ebf8c1ffdcde0d48ea1c505d383 | 32.99 | tpcc-scale-factor-1 | dolt | c531bcef84b80da341a5720ae526abd667bd7dd5 | 13.52 | 1 |
@max-hoffman DOLT
| read_tests | from_latency_median | to_latency_median | is_faster |
|---|---|---|---|
| covering_index_scan | 2.76 | 2.81 | 0 |
| groupby_scan | 17.32 | 17.32 | 0 |
| index_join | 5.28 | 2.61 | 1 |
| index_join_scan | 2.57 | 2.57 | 0 |
| index_scan | 53.85 | 54.83 | 0 |
| oltp_point_select | 0.46 | 0.46 | 0 |
| oltp_read_only | 7.56 | 7.7 | 0 |
| select_random_points | 0.77 | 0.77 | 0 |
| select_random_ranges | 0.92 | 0.92 | 0 |
| table_scan | 54.83 | 55.82 | 0 |
| types_table_scan | 142.39 | 142.39 | 0 |
| write_tests | from_latency_median | to_latency_median | is_faster |
|---|---|---|---|
| oltp_delete_insert | 6.09 | 6.09 | 0 |
| oltp_insert | 3.02 | 3.02 | 0 |
| oltp_read_write | 14.21 | 14.21 | 0 |
| oltp_update_index | 3.13 | 3.07 | 0 |
| oltp_update_non_index | 3.02 | 3.02 | 0 |
| oltp_write_only | 6.43 | 6.43 | 0 |
| types_delete_insert | 6.67 | 6.67 | 0 |
Additional work is required for integration with DoltgreSQL.
#benchmark
@max-hoffman workflow run: https://github.com/dolthub/dolt/actions/runs/9687941574
@max-hoffman DOLT
| test_name | from_latency_p95 | to_latency_p95 | is_faster |
|---|---|---|---|
| tpcc-scale-factor-1 | 74.46 | 78.6 | 0 |
| test_name | server_name | server_version | tps | test_name | server_name | server_version | tps | is_faster |
|---|---|---|---|---|---|---|---|---|
| tpcc-scale-factor-1 | dolt | 0c4b3f956c2b1f19392c6a688496add36ec521ff | 32.94 | tpcc-scale-factor-1 | dolt | ef8cc8b82f06f9287c74bd5739935bc44df1befc | 32.89 | 0 |
@max-hoffman DOLT
| read_tests | from_latency_median | to_latency_median | is_faster |
|---|---|---|---|
| covering_index_scan | 2.86 | 2.86 | 0 |
| groupby_scan | 17.01 | 17.32 | 0 |
| index_join | 5.28 | 2.66 | 1 |
| index_join_scan | 2.52 | 2.57 | 0 |
| index_scan | 53.85 | 53.85 | 0 |
| oltp_point_select | 0.44 | 0.46 | 0 |
| oltp_read_only | 7.43 | 7.56 | 0 |
| select_random_points | 0.73 | 0.74 | 0 |
| select_random_ranges | 0.87 | 0.89 | 0 |
| table_scan | 54.83 | 54.83 | 0 |
| types_table_scan | 139.85 | 142.39 | 0 |
| write_tests | from_latency_median | to_latency_median | is_faster |
|---|---|---|---|
| oltp_delete_insert | 6.09 | 6.09 | 0 |
| oltp_insert | 2.97 | 3.02 | 0 |
| oltp_read_write | 13.7 | 13.95 | 0 |
| oltp_update_index | 3.07 | 3.07 | 0 |
| oltp_update_non_index | 3.02 | 3.02 | 0 |
| oltp_write_only | 6.32 | 6.43 | 0 |
| types_delete_insert | 6.55 | 6.67 | 0 |
@max-hoffman DOLT
| comparing_percentages |
|---|
| 100.000000 to 100.000000 |
| version | result | total |
|---|---|---|
| c50e5b8 | ok | 5937457 |
| version | total_tests |
|---|---|
| c50e5b8 | 5937457 |
| correctness_percentage |
|---|
| 100.0 |
@coffeegoddd DOLT
| comparing_percentages |
|---|
| 100.000000 to 100.000000 |
| version | result | total |
|---|---|---|
| 921e758 | ok | 5937457 |
| version | total_tests |
|---|---|
| 921e758 | 5937457 |
| correctness_percentage |
|---|
| 100.0 |
#benchmark
@max-hoffman workflow run: https://github.com/dolthub/dolt/actions/runs/9880357397
@max-hoffman DOLT
| comparing_percentages |
|---|
| 100.000000 to 100.000000 |
| version | result | total |
|---|---|---|
| ef3a0cf | ok | 5937457 |
| version | total_tests |
|---|---|
| ef3a0cf | 5937457 |
| correctness_percentage |
|---|
| 100.0 |
@max-hoffman DOLT
| test_name | from_latency_p95 | to_latency_p95 | is_faster |
|---|---|---|---|
| tpcc-scale-factor-1 | 74.46 | 74.46 | 0 |
| test_name | server_name | server_version | tps | test_name | server_name | server_version | tps | is_faster |
|---|---|---|---|---|---|---|---|---|
| tpcc-scale-factor-1 | dolt | 2753e1b20fd606b35574c2d2906bbd90a3e203d3 | 32.49 | tpcc-scale-factor-1 | dolt | ef3a0cfeafd8a2a40b3cd9c659e09d30c07b9689 | 32.63 | 0 |
@max-hoffman DOLT
| read_tests | from_latency_median | to_latency_median | is_faster |
|---|---|---|---|
| covering_index_scan | 2.97 | 3.02 | 0 |
| groupby_scan | 17.32 | 17.32 | 0 |
| index_join | 5.37 | 5.28 | 0 |
| index_join_scan | 2.57 | 2.57 | 0 |
| index_scan | 53.85 | 53.85 | 0 |
| oltp_point_select | 0.46 | 0.46 | 0 |
| oltp_read_only | 7.7 | 7.7 | 0 |
| select_random_points | 0.75 | 0.75 | 0 |
| select_random_ranges | 0.9 | 0.9 | 0 |
| table_scan | 55.82 | 55.82 | 0 |
| types_table_scan | 142.39 | 142.39 | 0 |
| write_tests | from_latency_median | to_latency_median | is_faster |
|---|---|---|---|
| oltp_delete_insert | 5.99 | 6.09 | 0 |
| oltp_insert | 2.97 | 2.97 | 0 |
| oltp_read_write | 13.95 | 14.21 | 0 |
| oltp_update_index | 3.07 | 3.07 | 0 |
| oltp_update_non_index | 3.02 | 3.02 | 0 |
| oltp_write_only | 6.32 | 6.43 | 0 |
| types_delete_insert | 6.67 | 6.55 | 0 |
@max-hoffman DOLT
| comparing_percentages |
|---|
| 100.000000 to 100.000000 |
| version | result | total |
|---|---|---|
| c586e8f | ok | 5937457 |
| version | total_tests |
|---|---|
| c586e8f | 5937457 |
| correctness_percentage |
|---|
| 100.0 |
@coffeegoddd DOLT
| comparing_percentages |
|---|
| 100.000000 to 100.000000 |
| version | result | total |
|---|---|---|
| 1a7e001 | ok | 5937457 |
| version | total_tests |
|---|---|
| 1a7e001 | 5937457 |
| correctness_percentage |
|---|
| 100.0 |
@max-hoffman DOLT
| comparing_percentages |
|---|
| 100.000000 to 100.000000 |
| version | result | total |
|---|---|---|
| f5ab6be | ok | 5937457 |
| version | total_tests |
|---|---|
| f5ab6be | 5937457 |
| correctness_percentage |
|---|
| 100.0 |
@coffeegoddd DOLT
| comparing_percentages |
|---|
| 100.000000 to 100.000000 |
| version | result | total |
|---|---|---|
| 327ce5e | ok | 5937457 |
| version | total_tests |
|---|---|
| 327ce5e | 5937457 |
| correctness_percentage |
|---|
| 100.0 |
#benchmark
@max-hoffman workflow run: https://github.com/dolthub/dolt/actions/runs/9912163940
@max-hoffman DOLT
| test_name | from_latency_p95 | to_latency_p95 | is_faster |
|---|---|---|---|
| tpcc-scale-factor-1 | 74.46 | 77.19 | 0 |
| test_name | server_name | server_version | tps | test_name | server_name | server_version | tps | is_faster |
|---|---|---|---|---|---|---|---|---|
| tpcc-scale-factor-1 | dolt | 3745baa07ca9acaf820685db6675486b0ff880d3 | 32.69 | tpcc-scale-factor-1 | dolt | 327ce5e9d71644e5a079eb1777b86b6f67f53427 | 32.51 | 0 |
@max-hoffman DOLT
| read_tests | from_latency_median | to_latency_median | is_faster |
|---|---|---|---|
| covering_index_scan | 3.02 | 3.02 | 0 |
| groupby_scan | 17.32 | 17.32 | 0 |
| index_join | 5.28 | 5.28 | 0 |
| index_join_scan | 2.61 | 2.61 | 0 |
| index_scan | 55.82 | 55.82 | 0 |
| oltp_point_select | 0.46 | 0.46 | 0 |
| oltp_read_only | 7.84 | 7.84 | 0 |
| select_random_points | 0.75 | 0.75 | 0 |
| select_random_ranges | 0.9 | 0.9 | 0 |
| table_scan | 56.84 | 57.87 | 0 |
| types_table_scan | 147.61 | 147.61 | 0 |
| write_tests | from_latency_median | to_latency_median | is_faster |
|---|---|---|---|
| oltp_delete_insert | 6.09 | 6.09 | 0 |
| oltp_insert | 3.02 | 3.02 | 0 |
| oltp_read_write | 14.21 | 14.21 | 0 |
| oltp_update_index | 3.13 | 3.13 | 0 |
| oltp_update_non_index | 3.02 | 3.02 | 0 |
| oltp_write_only | 6.43 | 6.43 | 0 |
| types_delete_insert | 6.67 | 6.67 | 0 |
Trying to figure out how to add a unit test for making sure we do the optimization when we expect. Surprisingly easy to accidentally disable.
@max-hoffman DOLT
| comparing_percentages |
|---|
| 100.000000 to 100.000000 |
| version | result | total |
|---|---|---|
| ba424ee | ok | 5937457 |
| version | total_tests |
|---|---|
| ba424ee | 5937457 |
| correctness_percentage |
|---|
| 100.0 |
#benchmark
@max-hoffman workflow run: https://github.com/dolthub/dolt/actions/runs/9914037567
@max-hoffman DOLT
| test_name | from_latency_p95 | to_latency_p95 | is_faster |
|---|---|---|---|
| tpcc-scale-factor-1 | 75.82 | 75.82 | 0 |
| test_name | server_name | server_version | tps | test_name | server_name | server_version | tps | is_faster |
|---|---|---|---|---|---|---|---|---|
| tpcc-scale-factor-1 | dolt | 3745baa07ca9acaf820685db6675486b0ff880d3 | 32.33 | tpcc-scale-factor-1 | dolt | ba424eeb395109be71ef6b7bb33d3b0226e04fca | 32.28 | 0 |
@max-hoffman DOLT
| read_tests | from_latency_median | to_latency_median | is_faster |
|---|---|---|---|
| covering_index_scan | 3.02 | 3.02 | 0 |
| groupby_scan | 17.32 | 17.32 | 0 |
| index_join | 5.28 | 2.71 | 1 |
| index_join_scan | 2.61 | 2.61 | 0 |
| index_scan | 54.83 | 54.83 | 0 |
| oltp_point_select | 0.46 | 0.46 | 0 |
| oltp_read_only | 7.7 | 7.7 | 0 |
| select_random_points | 0.75 | 0.75 | 0 |
| select_random_ranges | 0.9 | 0.9 | 0 |
| table_scan | 56.84 | 56.84 | 0 |
| types_table_scan | 144.97 | 144.97 | 0 |
| write_tests | from_latency_median | to_latency_median | is_faster |
|---|---|---|---|
| oltp_delete_insert | 5.99 | 5.99 | 0 |
| oltp_insert | 2.97 | 3.02 | 0 |
| oltp_read_write | 13.95 | 13.95 | 0 |
| oltp_update_index | 3.07 | 3.07 | 0 |
| oltp_update_non_index | 2.97 | 3.02 | 0 |
| oltp_write_only | 6.32 | 6.43 | 0 |
| types_delete_insert | 6.55 | 6.67 | 0 |