dolt [rowexec] custom rowexec

This PR adds custom Dolt execution operators for lookup joins. When building an execution plan, we try to replace joinIter with a Dolt equivalent that inlines the key building and map get. This is a lot faster because repeatedly building the secondary iterator and materializing sql.Rows in-between lookups are expensive. The cases where we can use map.Get instead of map.PrefixGet for strict key lookups will also perform fewer chunkstore reads.

This PR moves filters in join children to after materializing lookup join rows.

Barring any correctness issues I might be overlooking, ths prototype brings index_join from 5.18 ms/query to 2.64 ms/q, which will be 2.0x MySQL's latency.

Gaps before merging this:

joins with ITAs on left still broken
projection edge cases, non covering lookups. Will need to refactor a lot of the LookupBuilder implementations to expose prolly.MapIter than sql.RowIter
strict key vs non-unique lookups
maybe a more organized/extendable scaffold before merging this

Jun 25 '24 21:06 max-hoffman

#benchmark

Jun 26 '24 15:06 max-hoffman

@max-hoffman workflow run: https://github.com/dolthub/dolt/actions/runs/9682864996

Jun 26 '24 15:06 github-actions[bot]

@max-hoffman DOLT

test_name	from_latency_p95	to_latency_p95	is_faster
tpcc-scale-factor-1	73.13	89.16	0

test_name	server_name	server_version	tps	test_name	server_name	server_version	tps	is_faster
tpcc-scale-factor-1	dolt	f7abadf73e214ebf8c1ffdcde0d48ea1c505d383	32.99	tpcc-scale-factor-1	dolt	c531bcef84b80da341a5720ae526abd667bd7dd5	13.52	1

Jun 26 '24 16:06 coffeegoddd

@max-hoffman DOLT

read_tests	from_latency_median	to_latency_median	is_faster
covering_index_scan	2.76	2.81	0
groupby_scan	17.32	17.32	0
index_join	5.28	2.61	1
index_join_scan	2.57	2.57	0
index_scan	53.85	54.83	0
oltp_point_select	0.46	0.46	0
oltp_read_only	7.56	7.7	0
select_random_points	0.77	0.77	0
select_random_ranges	0.92	0.92	0
table_scan	54.83	55.82	0
types_table_scan	142.39	142.39	0

write_tests	from_latency_median	to_latency_median
oltp_delete_insert	6.09	6.09
oltp_insert	3.02	3.02
oltp_read_write	14.21	14.21
oltp_update_index	3.13	3.07
oltp_update_non_index	3.02	3.02
oltp_write_only	6.43	6.43
types_delete_insert	6.67	6.67

Jun 26 '24 17:06 coffeegoddd

Additional work is required for integration with DoltgreSQL.

Jun 26 '24 19:06 github-actions[bot]

#benchmark

Jun 26 '24 23:06 max-hoffman

@max-hoffman workflow run: https://github.com/dolthub/dolt/actions/runs/9687941574

Jun 26 '24 23:06 github-actions[bot]

@max-hoffman DOLT

test_name	from_latency_p95	to_latency_p95	is_faster
tpcc-scale-factor-1	74.46	78.6	0

test_name	server_name	server_version	tps	test_name	server_name	server_version	tps	is_faster
tpcc-scale-factor-1	dolt	0c4b3f956c2b1f19392c6a688496add36ec521ff	32.94	tpcc-scale-factor-1	dolt	ef8cc8b82f06f9287c74bd5739935bc44df1befc	32.89	0

Jun 26 '24 23:06 coffeegoddd

@max-hoffman DOLT

read_tests	from_latency_median	to_latency_median	is_faster
covering_index_scan	2.86	2.86	0
groupby_scan	17.01	17.32	0
index_join	5.28	2.66	1
index_join_scan	2.52	2.57	0
index_scan	53.85	53.85	0
oltp_point_select	0.44	0.46	0
oltp_read_only	7.43	7.56	0
select_random_points	0.73	0.74	0
select_random_ranges	0.87	0.89	0
table_scan	54.83	54.83	0
types_table_scan	139.85	142.39	0

write_tests	from_latency_median	to_latency_median
oltp_delete_insert	6.09	6.09
oltp_insert	2.97	3.02
oltp_read_write	13.7	13.95
oltp_update_index	3.07	3.07
oltp_update_non_index	3.02	3.02
oltp_write_only	6.32	6.43
types_delete_insert	6.55	6.67

Jun 27 '24 00:06 coffeegoddd

@max-hoffman DOLT

comparing_percentages
100.000000 to 100.000000

version	result	total
c50e5b8	ok	5937457

version	total_tests
c50e5b8	5937457

correctness_percentage
100.0

Jul 10 '24 15:07 coffeegoddd

@coffeegoddd DOLT

comparing_percentages
100.000000 to 100.000000

version	result	total
921e758	ok	5937457

version	total_tests
921e758	5937457

correctness_percentage
100.0

Jul 10 '24 15:07 coffeegoddd

#benchmark

Jul 10 '24 19:07 max-hoffman

@max-hoffman workflow run: https://github.com/dolthub/dolt/actions/runs/9880357397

Jul 10 '24 19:07 github-actions[bot]

@max-hoffman DOLT

comparing_percentages
100.000000 to 100.000000

version	result	total
ef3a0cf	ok	5937457

version	total_tests
ef3a0cf	5937457

correctness_percentage
100.0

Jul 10 '24 20:07 coffeegoddd

@max-hoffman DOLT

test_name	from_latency_p95	to_latency_p95	is_faster
tpcc-scale-factor-1	74.46	74.46	0

test_name	server_name	server_version	tps	test_name	server_name	server_version	tps	is_faster
tpcc-scale-factor-1	dolt	2753e1b20fd606b35574c2d2906bbd90a3e203d3	32.49	tpcc-scale-factor-1	dolt	ef3a0cfeafd8a2a40b3cd9c659e09d30c07b9689	32.63	0

Jul 10 '24 20:07 coffeegoddd

@max-hoffman DOLT

read_tests	from_latency_median	to_latency_median
covering_index_scan	2.97	3.02
groupby_scan	17.32	17.32
index_join	5.37	5.28
index_join_scan	2.57	2.57
index_scan	53.85	53.85
oltp_point_select	0.46	0.46
oltp_read_only	7.7	7.7
select_random_points	0.75	0.75
select_random_ranges	0.9	0.9
table_scan	55.82	55.82
types_table_scan	142.39	142.39

write_tests	from_latency_median	to_latency_median
oltp_delete_insert	5.99	6.09
oltp_insert	2.97	2.97
oltp_read_write	13.95	14.21
oltp_update_index	3.07	3.07
oltp_update_non_index	3.02	3.02
oltp_write_only	6.32	6.43
types_delete_insert	6.67	6.55

Jul 10 '24 21:07 coffeegoddd

@max-hoffman DOLT

comparing_percentages
100.000000 to 100.000000

version	result	total
c586e8f	ok	5937457

version	total_tests
c586e8f	5937457

correctness_percentage
100.0

Jul 12 '24 15:07 coffeegoddd

@coffeegoddd DOLT

comparing_percentages
100.000000 to 100.000000

version	result	total
1a7e001	ok	5937457

version	total_tests
1a7e001	5937457

correctness_percentage
100.0

Jul 12 '24 15:07 coffeegoddd

@max-hoffman DOLT

comparing_percentages
100.000000 to 100.000000

version	result	total
f5ab6be	ok	5937457

version	total_tests
f5ab6be	5937457

correctness_percentage
100.0

Jul 12 '24 17:07 coffeegoddd

@coffeegoddd DOLT

comparing_percentages
100.000000 to 100.000000

version	result	total
327ce5e	ok	5937457

version	total_tests
327ce5e	5937457

correctness_percentage
100.0

Jul 12 '24 18:07 coffeegoddd

#benchmark

Jul 12 '24 18:07 max-hoffman

@max-hoffman workflow run: https://github.com/dolthub/dolt/actions/runs/9912163940

Jul 12 '24 18:07 github-actions[bot]

@max-hoffman DOLT

test_name	from_latency_p95	to_latency_p95	is_faster
tpcc-scale-factor-1	74.46	77.19	0

test_name	server_name	server_version	tps	test_name	server_name	server_version	tps	is_faster
tpcc-scale-factor-1	dolt	3745baa07ca9acaf820685db6675486b0ff880d3	32.69	tpcc-scale-factor-1	dolt	327ce5e9d71644e5a079eb1777b86b6f67f53427	32.51	0

Jul 12 '24 18:07 coffeegoddd

@max-hoffman DOLT

read_tests	from_latency_median	to_latency_median
covering_index_scan	3.02	3.02
groupby_scan	17.32	17.32
index_join	5.28	5.28
index_join_scan	2.61	2.61
index_scan	55.82	55.82
oltp_point_select	0.46	0.46
oltp_read_only	7.84	7.84
select_random_points	0.75	0.75
select_random_ranges	0.9	0.9
table_scan	56.84	57.87
types_table_scan	147.61	147.61

write_tests	from_latency_median	to_latency_median
oltp_delete_insert	6.09	6.09
oltp_insert	3.02	3.02
oltp_read_write	14.21	14.21
oltp_update_index	3.13	3.13
oltp_update_non_index	3.02	3.02
oltp_write_only	6.43	6.43
types_delete_insert	6.67	6.67

Jul 12 '24 19:07 coffeegoddd

Trying to figure out how to add a unit test for making sure we do the optimization when we expect. Surprisingly easy to accidentally disable.

Jul 12 '24 19:07 max-hoffman

@max-hoffman DOLT

comparing_percentages
100.000000 to 100.000000

version	result	total
ba424ee	ok	5937457

version	total_tests
ba424ee	5937457

correctness_percentage
100.0

Jul 12 '24 20:07 coffeegoddd

#benchmark

Jul 12 '24 20:07 max-hoffman

@max-hoffman workflow run: https://github.com/dolthub/dolt/actions/runs/9914037567

Jul 12 '24 20:07 github-actions[bot]

@max-hoffman DOLT

test_name	from_latency_p95	to_latency_p95	is_faster
tpcc-scale-factor-1	75.82	75.82	0

test_name	server_name	server_version	tps	test_name	server_name	server_version	tps	is_faster
tpcc-scale-factor-1	dolt	3745baa07ca9acaf820685db6675486b0ff880d3	32.33	tpcc-scale-factor-1	dolt	ba424eeb395109be71ef6b7bb33d3b0226e04fca	32.28	0

Jul 12 '24 21:07 coffeegoddd

@max-hoffman DOLT

read_tests	from_latency_median	to_latency_median	is_faster
covering_index_scan	3.02	3.02	0
groupby_scan	17.32	17.32	0
index_join	5.28	2.71	1
index_join_scan	2.61	2.61	0
index_scan	54.83	54.83	0
oltp_point_select	0.46	0.46	0
oltp_read_only	7.7	7.7	0
select_random_points	0.75	0.75	0
select_random_ranges	0.9	0.9	0
table_scan	56.84	56.84	0
types_table_scan	144.97	144.97	0

write_tests	from_latency_median	to_latency_median
oltp_delete_insert	5.99	5.99
oltp_insert	2.97	3.02
oltp_read_write	13.95	13.95
oltp_update_index	3.07	3.07
oltp_update_non_index	2.97	3.02
oltp_write_only	6.32	6.43
types_delete_insert	6.55	6.67

Jul 12 '24 22:07 coffeegoddd