couchbase-lite-core
couchbase-lite-core copied to clipboard
Wrong index being chosen
Attached is a small project that will choose an index that selects an incorrect index which causes a scan of the entire database. The IX_User_ClientRowId index is being chosen even though the IX_OrderHistory_CompleteDate should be used.
What would really help here are one or both of two things: 1) being able to specify index hints; 2) the ability to create partial indexes (particularly by document type). However, in this case, I just need some solution to allow this and other similar queries to complete in a timely manner.
Hm, I don't have Windows, or any experience in .NET development. Maybe @borrrden can take a look?
This project references a protected Nuget package called Telerik.Windows.Documents.Spreadsheet.FormatProviders.OpenXml.for.Net40 and so I am unable to build it since I do not have a Telerik account.
Sorry @borrrden . Here is an updated version that eliminates that reference. AOTEX.Console.zip
Here is the output from the program
Testing open tasks
kv_default_seqs
IX_TypeId
IX_Client_Key
IX_OrderHistory_OrderRowId
IX_OrderHistory_CompleteDate
IX_User_ClientRowId
SELECT fl_result(fl_value("O".body, 'RowId')) FROM kv_default AS "H" CROSS JOIN kv_default AS "O" ON (fl_value("O".body, 'CurrentTaskId') = fl_value("H".body, 'OrderHistory')) AND ("O".flags & 1) = 0 WHERE (fl_value("H".body, 'Type') = 'OrderHistory' AND (fl_value("H".body, 'CompleteDate') = x'' OR fl_value("H".body, 'CompleteDate') IS NULL)) AND ("H".flags & 1) = 0
0|0|0| SEARCH TABLE kv_default AS H USING INDEX IX_User_ClientRowId (<expr>=?)
0|1|1| SCAN TABLE kv_default AS O
Done
BTW this spun off from this forum thread.
Output of query explain plain from SQLite is here
That's the output I see too. It needs to use the other index. For this query, the chosen index results in a large scan that takes minutes.
Get Outlook for Androidhttps://aka.ms/ghei36
From: Jim Borden [email protected] Sent: Thursday, January 31, 2019 7:16:55 PM To: couchbase/couchbase-lite-core Cc: Brian Limkemann; Author Subject: Re: [couchbase/couchbase-lite-core] Wrong index being chosen (#683)
Output of query explain plain from SQLite is herehttps://gist.github.com/borrrden/6cdbbb40a84b45b5a7c387ab273beac7
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHubhttps://github.com/couchbase/couchbase-lite-core/issues/683#issuecomment-459557630, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AJ6LVWg-BKafeJ6aonzohgfZpS9pSNQJks5vI4d3gaJpZM4adNxv.
By the way is this as minimal as this case can get? There are several indexes in there with no clue as to how they are created since all that logic is provided in a compiled binary.
However, judging by the forum I think that having the type property on the first portion of all the indexes might be confusing SQLite so as a workaround you could try removing that and only indexing the property in question. This will result in a slight performance hit but in theory not as bad as the one you are facing now. We will continue to analyze what we've found from the database that was created by the program.
@borrrden I changed the index code to remove ALL of the indexes except two: one on Type and RowId and one on just the CompleteDate as you suggested. The TypeId index is still chosen completely disregarding the CompleteDate index.
If you replace MainDatabase.SetupDatabase with this code you will see this as well: protected override void SetupDatabase() { foreach (var n in Database.GetIndexes()) { if (n != "IX_TypeId") Database.DeleteIndex(n); }
// Configuration item indexes
//Database.CreateIndex("IX_Client_Key", IndexBuilder.ValueIndex(ValueIndexItem.Expression(Expression.Property(nameof(Data.DocumentBase.Type))),
// ValueIndexItem.Expression(Expression.Property(nameof(Client.Key)))));
// Order indexes
//Database.CreateIndex("IX_Order_Client", IndexBuilder.ValueIndex(ValueIndexItem.Expression(Expression.Property(nameof(Data.DocumentBase.Type)).EqualTo(Expression.String(nameof(Order)))),
// ValueIndexItem.Expression(Expression.Property(nameof(Order.ClientId)))));
//Database.CreateIndex("IX_Order_ClientContactId", IndexBuilder.ValueIndex(ValueIndexItem.Expression(Expression.Property(nameof(Data.DocumentBase.Type)).EqualTo(Expression.String(nameof(Order)))),
// ValueIndexItem.Expression(Expression.Property(nameof(Order.ClientContactId)))));
//Database.CreateIndex("IX_Order_OrderNumber", IndexBuilder.ValueIndex(ValueIndexItem.Expression(Expression.Property(nameof(Data.DocumentBase.Type)).EqualTo(Expression.String(nameof(Order)))),
// ValueIndexItem.Expression(Expression.Property(nameof(Order.OrderNumber)))));
//Database.CreateIndex("IX_Order_CurrentHistory", IndexBuilder.ValueIndex(ValueIndexItem.Expression(Expression.Property(nameof(Data.DocumentBase.Type)).EqualTo(Expression.String(nameof(Order)))),
// ValueIndexItem.Expression(Expression.Property(nameof(Order.CurrentTaskId)))));
// OrderHistory
//Database.CreateIndex("IX_OrderHistory_OrderRowId", IndexBuilder.ValueIndex(ValueIndexItem.Expression(Expression.Property(nameof(Data.DocumentBase.Type))),
// ValueIndexItem.Expression(Expression.Property(nameof(OrderHistory.OrderRowId)))));
Database.CreateIndex("IX_OrderHistory_CompleteDate", IndexBuilder.ValueIndex(//ValueIndexItem.Expression(Expression.Property(nameof(Data.DocumentBase.Type))),
ValueIndexItem.Expression(Expression.Property(nameof(OrderHistory.CompleteDate)))));
// User indexes
//Database.CreateIndex("IX_User_ClientRowId", IndexBuilder.ValueIndex(ValueIndexItem.Expression(Expression.Property(nameof(Data.DocumentBase.Type))),
// ValueIndexItem.Expression(Expression.Property(nameof(User.ClientRowId)))));
}
Try removing Type from the Type+RowID index. It may be inducing SQLite to choose that index because your query filters on that property.
FYI, related issues/enhancements:
Here's the query, copied from the forum thread:
QueryBuilder.Select(SelectResult.Expression(Expression.Property(nameof(Order.RowId)).From("O")))
.From(DataSource.Database(Manager.MainDb.Database).As("H"))
.Join(Join.InnerJoin(DataSource.Database(Manager.MainDb.Database).As("O"))
On(Expression.Property(nameof(Order.CurrentTaskId)).From("O").EqualTo(Expression.Property(nameof(OrderHistory.OrderRowId)).From("H"))))
.Where(DatabaseBase.CreateWhereDocumentTypeEquals(nameof(OrderHistory), "H")
.And(Expression.Property(nameof(OrderHistory.CompleteDate)).From("H").IsNullOrMissing()))
Which give this using Explain():
SELECT fl_result(fl_value("O".body, 'RowId')) FROM kv_default AS "H" CROSS JOIN kv_default AS "O" ON (fl_value("O".body, 'CurrentTaskId') = fl_value("H".body, 'OrderRowId')) AND ("O".flags & 1) = 0 WHERE (fl_value("H".body, 'Type') = 'OrderHistory' AND (fl_value("H".body, 'CompleteDate') = x'' OR fl_value("H".body, 'CompleteDate') IS NULL)) AND ("H".flags & 1) = 0
0|0|0| SEARCH TABLE kv_default AS H **USING INDEX IX_User_ClientRowId (<expr>=?)**
0|1|1| SCAN TABLE kv_default AS O
I have these indexes created (among others):
Database.CreateIndex("IX_OrderHistory_CompleteDate", IndexBuilder.ValueIndex(ValueIndexItem.Expression(Expression.Property(nameof(Data.DocumentBase.Type))),
ValueIndexItem.Expression(Expression.Property(nameof(OrderHistory.CompleteDate)))));
// User indexes
Database.CreateIndex("IX_User_ClientRowId", IndexBuilder.ValueIndex(ValueIndexItem.Expression(Expression.Property(nameof(Data.DocumentBase.Type))),
ValueIndexItem.Expression(Expression.Property(nameof(User.ClientRowId)))));
Why isn't it choosing IX_OrderHistory_CompleteDate? The query never ever uses the UserClientRowId field!
Remove all the other indexes worked just fine. But, how can I now look up a specific document by type/rowid which is the primary key (unique value) for each document? I also need to be able to do that.
Yeah, this isn't a true workaround, but it gives another data point.
In general, the "primary key" of your data should be stored as the documentID. That's because
- The documentID is the only thing that can be guaranteed unique
- There is implicitly an index on it
- There's a built-in API call to get a document given its documentID
If you have multiple primary keys, like (type, rowid), the convention is to concatenate them with some unambiguous delimiter like ":", "`,'", etc.
The document id is the same as a sting concatenating my Type and RowId fields with an _ (e.g., "Type_Guidvalue"). So that is automatically created as a primary key?
Yes, because Database.getDocument() needs to be fast.
(Internally, the documentID is the primary key of the SQLite table that stores the documents.)
So that is the 'key' column in the kv_default table? And that is used in lieu of any other search criteria? In other words, that works like any other relational database primary key and thus doesn't need to be considered at all in our user code?
Yup.
Note: It doesn't appear that SQLite is choosing that index for any particular reason. It is just choosing whatever index was created last in the database. If I move the code to create the user indexes first then I get this.
Testing open tasks
kv_default_seqs
IX_TypeId
IX_Client_Key
IX_User_ClientRowId
IX_OrderHistory_OrderRowId
IX_OrderHistory_CompleteDate
SELECT fl_result(fl_value("O".body, 'RowId')) FROM kv_default AS "H" CROSS JOIN kv_default AS "O" ON (fl_value("O".body, 'CurrentTaskId') = fl_value("H".body, 'OrderHistory')) AND ("O".flags & 1) = 0 WHERE (fl_value("H".body, 'Type') = 'OrderHistory' AND (fl_value("H".body, 'CompleteDate') = x'' OR fl_value("H".body, 'CompleteDate') IS NULL)) AND ("H".flags & 1) = 0
0|0|0| SEARCH TABLE kv_default AS H USING INDEX IX_OrderHistory_CompleteDate (<expr>=?)
0|1|1| SCAN TABLE kv_default AS O
If I switch the order of the order history indexes I get this
Testing open tasks
IX_TypeId
IX_Client_Key
IX_User_ClientRowId
IX_OrderHistory_CompleteDate
IX_OrderHistory_OrderRowId
SELECT fl_result(fl_value("O".body, 'RowId')) FROM kv_default AS "H" CROSS JOIN kv_default AS "O" ON (fl_value("O".body, 'CurrentTaskId') = fl_value("H".body, 'OrderHistory')) AND ("O".flags & 1) = 0 WHERE (fl_value("H".body, 'Type') = 'OrderHistory' AND (fl_value("H".body, 'CompleteDate') = x'' OR fl_value("H".body, 'CompleteDate') IS NULL)) AND ("H".flags & 1) = 0
0|0|0| SEARCH TABLE kv_default AS H USING INDEX IX_OrderHistory_OrderRowId (<expr>=?)
0|1|1| SCAN TABLE kv_default AS O
I don't think SQLite has enough information to make an educated choice about which index to use
Also of note is that IX_OrderHistory_CompleteDate appears in the extended explain output, but no other index does
e.g.
[
5,
"ReopenIdx",
2,
10,
0,
"k(3,,,)",
"02",
"root=10 iDb=0; IX_OrderHistory_CompleteDate"
],
So is this a case where being able to specify an index hint would be useful? I think so - sometimes we are smarter than the database.
I'm not sure anymore. I wonder if the explain output is actually correct. The extended output was from the original setup which reported using CompleteDate as it should, while explain said that it was using ClientRowId.
Migrated to CBL-69
Whoops, keeping this one open so that third party comments can still come in.
Any plans on when this feature would be implemented? I would like to create partial indexes for certain long-running queries (Using a Where condition and Order by), something like this:
IndexBuilder.ValueIndex(
ValueIndexItem.Expression(Expression.Property("type").EqualTo(Expression.String("contact"))),
ValueIndexItem.Expression(Expression.Not(Expression.Property("isDeleted"))),
ValueIndexItem.Property("tenantId"),
ValueIndexItem.Property("status"),
ValueIndexItem.Property("creationDate")) // creationDate DESC
); ```
It would need to be prioritized by our project management. That's driven in part by customer demand, so if you're a Couchbase customer you should definitely let your SE know this feature is important to you.
Ok, thanks for the information. I will contact them.
Maybe not relate to this issue, but I got a performance issue with the multiple joins (my query joins 8 docs) and it takes about 3 or 4 seconds, I know we shouldn't join many document in NoSQL, but I have no other choice. So I'm checking a temporary solution is to delete all existing index and recreating again (I see c4db_maintenance(db, kC4Reindex, &err) doesn't work), this temp solution worked now a query only takes 0.5 s !!! but only in release mode build so I don't understand why this doesn't work in debug mode build. Do you have any ideas about difference?