turso icon indicating copy to clipboard operation
turso copied to clipboard

feat: use covering indexes

Open KKould opened this issue 1 year ago • 3 comments

issue: https://github.com/tursodatabase/limbo/issues/364

this pr enables limbo to detect whether the relevant columns in the plan can be index covered while the index is hitting, thereby reducing the need to return to the table.

TODO:

  • [ ] : Test for related_columns (I refer to other optimization rules but cannot find relevant tests. Do you have any suggestions on this?)
  • [ ] : FIXME: row_id is displayed as column in explain
explain select id, age from users where age > 90 limit 1;

addr  opcode             p1    p2    p3    p4             p5  comment
----  -----------------  ----  ----  ----  -------------  --  -------
0     Init               0     14    0                    0   Start at 14
1     OpenReadAsync      0     2     0                    0   table=users, root=2
2     OpenReadAwait      0     0     0                    0   
3     OpenReadAsync      1     274   0                    0   table=age_idx, root=274
4     OpenReadAwait      0     0     0                    0   
5     Integer            90    1     0                    0   r[1]=90
6     SeekGT             1     13    1                    0   
7       Column           1     1     2                    0   r[2]=age_idx.column 1
8       Column           1     0     3                    0   r[3]=age_idx.age
9       ResultRow        2     2     0                    0   output=r[2..3]
10      DecrJumpZero     4     13    0                    0   if (--r[4]==0) goto 13
11    NextAsync          1     0     0                    0   
12    NextAwait          1     7     0                    0   
13    Halt               0     0     0                    0   
14    Transaction        0     0     0                    0   
15    Integer            1     4     0                    0   r[4]=1
16    Goto               0     1     0                    0   

before:

explain select id, age from users where age > 90 limit 1;

addr  opcode             p1    p2    p3    p4             p5  comment
----  -----------------  ----  ----  ----  -------------  --  -------
0     Init               0     15    0                    0   Start at 15
1     OpenReadAsync      0     2     0                    0   table=users, root=2
2     OpenReadAwait      0     0     0                    0   
3     OpenReadAsync      1     274   0                    0   table=age_idx, root=274
4     OpenReadAwait      0     0     0                    0   
5     Integer            90    1     0                    0   r[1]=90
6     SeekGT             1     14    1                    0   
7       DeferredSeek     1     0     0                    0   
8       RowId            0     2     0                    0   r[2]=users.rowid
9       Column           0     9     3                    0   r[3]=users.age
10      ResultRow        2     2     0                    0   output=r[2..3]
11      DecrJumpZero     4     14    0                    0   if (--r[4]==0) goto 14
12    NextAsync          1     0     0                    0   
13    NextAwait          1     7     0                    0   
14    Halt               0     0     0                    0   
15    Transaction        0     0     0                    0   
16    Integer            1     4     0                    0   r[4]=1
17    Goto               0     1     0                    0   

KKould avatar Jan 01 '25 17:01 KKould

What do P1, P2, P3, P4, and P5 represent?

loloxwg avatar Jan 02 '25 02:01 loloxwg

the current CI error is because select id, age from users where age > 90 limit 1; does not use the index, so its results are inconsistent with the results using the index (https://github.com/tursodatabase/limbo/pull/593), but I can successfully run make test-compat locally (my pc windows & codespace ubuntu)

KKould avatar Jan 02 '25 05:01 KKould

Hi, I've read this and I think it's a great effort! Mainly I would like to see a bit of a different approach... here are some very raw thoughts, and not all of them might make sense on the implementation level with our current code, but here goes anyway:

  • the decision that we have a covering index could be made in optimizer already, so the table cursor would never be opened in the emitter/codegen phase because we know we dont need it.
  • we do need some kind of mapping between table columns and index columns as you have done here, but perhaps it could already be done on the Schema level, and not ad hoc in emitter. I.e. since we already know what the tables and indexes are when we start processing a query, we should already have the table-to-index column mappings available, technically.
  • Instead of having index_cover_cursors per se, we could distinguish between table_cursors and index_cursors in Program on a general level, and make the decision to read either from the table cursor or the index cursor depending on what our column mapping tells us.

Also sorry about the million merge conflicts...

jussisaurio avatar Jan 05 '25 14:01 jussisaurio

@jussisaurio Thank you for your comment, I think it is very valuable.I made the following changes based on your comment

  • When index cover is satisfied, just open the index cursor and replace the table cursor
  • I'm get the column remapping process to happen earlier when adding an index cover in Index::from_sql
  • index_cover_cursors was renamed to table_remapping_cursors, removing the table cursor id

If this modification meets your expectations, I will deal with the code conflicts later.

KKould avatar Jan 17 '25 11:01 KKould

This pull request has been marked as stale due to inactivity. It will be closed in 7 days if no further activity occurs.

penberg avatar Feb 17 '25 00:02 penberg