dolt icon indicating copy to clipboard operation
dolt copied to clipboard

Incorrect SemiJoin in plan tests

Open nicktobey opened this issue 7 months ago • 2 comments

One of our plan tests looks like this:

			"create table vals (val int unique key);",
			"create table ranges (min int unique key, max int, unique key(min,max));",
			"insert into vals values (0), (1), (2), (3), (4), (5), (6);",
			"insert into ranges values (0,2), (1,3), (2,4), (3,5), (4,6);",
				q:     "select * from vals where exists (select * from ranges where val between min and max limit 1 offset 1);",
				types: []plan.JoinType{plan.JoinTypeSemi},
				exp: []sql.Row{
					{1},
					{2},
					{3},
					{4},
					{5},
					{6},
				},

This is not the correct output, because {6} should not be in the result row. This query is supposed to match against vals that are in two of the ranges in ranges, while 6 is only in a single range.

Playing around with it, different values of offset produce different, also wrong results.

This is the generated plan:

+-------------------------------------------------------------+
| plan                                                        |
+-------------------------------------------------------------+
| SemiJoin                                                    |
|  ├─ ((vals.val >= ranges.min) AND (vals.val <= ranges.max)) |
|  ├─ Table                                                   |
|  │   └─ name: vals                                          |
|  └─ Offset(1)                                               |
|      └─ Table                                               |
|          ├─ name: ranges                                    |
|          └─ columns: [min max]                              |
+-------------------------------------------------------------+

The plan looks correct. We must not be executing the plan correctly.

I discovered this while writing tests for an unrelated issue, so I'm temporarily disabling this test while I fix the original issue, then coming back around for this.

nicktobey avatar Jan 19 '24 17:01 nicktobey

The plan looks correct. We must not be executing the plan correctly.

I'm just thick. This plan is not correct. I don't think we can safely apply the SemiJoin optimization here. The use of OFFSET is intended to require that the LHS matched multiple rows on the RHS, but there's no way to actually encode that information in the SemiJoin.

nicktobey avatar Jan 21 '24 02:01 nicktobey

The plan looks correct. We must not be executing the plan correctly.

I'm just thick. This plan is not correct. I don't think we can safely apply the SemiJoin optimization here. The use of OFFSET is intended to require that the LHS matched multiple rows on the RHS, but there's no way to actually encode that information in the SemiJoin.

We have something similar for lookup joins, though those also might be missing OFFSET:

func dfsLookupCandidates(rel memo.RelExpr, limitOk bool) (sql.TableId, []*memo.Index, []sql.Expression, bool) {
	if rel == nil {
		return 0, nil, nil, false
	}
	if !limitOk && rel.Group().RelProps.Limit != nil {
		// LOOKUP through a LIMIT is invalid
		return 0, nil, nil, false
	}

max-hoffman avatar Jan 22 '24 19:01 max-hoffman