couchbase-lite-core icon indicating copy to clipboard operation
couchbase-lite-core copied to clipboard

Wrong index being chosen

Open blimkemann opened this issue 6 years ago • 28 comments

Attached is a small project that will choose an index that selects an incorrect index which causes a scan of the entire database. The IX_User_ClientRowId index is being chosen even though the IX_OrderHistory_CompleteDate should be used.

AOTEX.Console.zip

What would really help here are one or both of two things: 1) being able to specify index hints; 2) the ability to create partial indexes (particularly by document type). However, in this case, I just need some solution to allow this and other similar queries to complete in a timely manner.

blimkemann avatar Jan 31 '19 21:01 blimkemann

Hm, I don't have Windows, or any experience in .NET development. Maybe @borrrden can take a look?

snej avatar Jan 31 '19 22:01 snej

This project references a protected Nuget package called Telerik.Windows.Documents.Spreadsheet.FormatProviders.OpenXml.for.Net40 and so I am unable to build it since I do not have a Telerik account.

borrrden avatar Jan 31 '19 23:01 borrrden

Sorry @borrrden . Here is an updated version that eliminates that reference. AOTEX.Console.zip

blimkemann avatar Jan 31 '19 23:01 blimkemann

Here is the output from the program

Testing open tasks
kv_default_seqs
IX_TypeId
IX_Client_Key
IX_OrderHistory_OrderRowId
IX_OrderHistory_CompleteDate
IX_User_ClientRowId
SELECT fl_result(fl_value("O".body, 'RowId')) FROM kv_default AS "H" CROSS JOIN kv_default AS "O" ON (fl_value("O".body, 'CurrentTaskId') = fl_value("H".body, 'OrderHistory')) AND ("O".flags & 1) = 0 WHERE (fl_value("H".body, 'Type') = 'OrderHistory' AND (fl_value("H".body, 'CompleteDate') = x'' OR fl_value("H".body, 'CompleteDate') IS NULL)) AND ("H".flags & 1) = 0
0|0|0| SEARCH TABLE kv_default AS H USING INDEX IX_User_ClientRowId (<expr>=?)
0|1|1| SCAN TABLE kv_default AS O

Done

borrrden avatar Jan 31 '19 23:01 borrrden

BTW this spun off from this forum thread.

snej avatar Feb 01 '19 00:02 snej

Output of query explain plain from SQLite is here

borrrden avatar Feb 01 '19 00:02 borrrden

That's the output I see too. It needs to use the other index. For this query, the chosen index results in a large scan that takes minutes.

Get Outlook for Androidhttps://aka.ms/ghei36


From: Jim Borden [email protected] Sent: Thursday, January 31, 2019 7:16:55 PM To: couchbase/couchbase-lite-core Cc: Brian Limkemann; Author Subject: Re: [couchbase/couchbase-lite-core] Wrong index being chosen (#683)

Output of query explain plain from SQLite is herehttps://gist.github.com/borrrden/6cdbbb40a84b45b5a7c387ab273beac7

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHubhttps://github.com/couchbase/couchbase-lite-core/issues/683#issuecomment-459557630, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AJ6LVWg-BKafeJ6aonzohgfZpS9pSNQJks5vI4d3gaJpZM4adNxv.

blimkemann avatar Feb 01 '19 00:02 blimkemann

By the way is this as minimal as this case can get? There are several indexes in there with no clue as to how they are created since all that logic is provided in a compiled binary.

However, judging by the forum I think that having the type property on the first portion of all the indexes might be confusing SQLite so as a workaround you could try removing that and only indexing the property in question. This will result in a slight performance hit but in theory not as bad as the one you are facing now. We will continue to analyze what we've found from the database that was created by the program.

borrrden avatar Feb 01 '19 01:02 borrrden

@borrrden I changed the index code to remove ALL of the indexes except two: one on Type and RowId and one on just the CompleteDate as you suggested. The TypeId index is still chosen completely disregarding the CompleteDate index.

If you replace MainDatabase.SetupDatabase with this code you will see this as well: protected override void SetupDatabase() { foreach (var n in Database.GetIndexes()) { if (n != "IX_TypeId") Database.DeleteIndex(n); }

		// Configuration item indexes
		//Database.CreateIndex("IX_Client_Key", IndexBuilder.ValueIndex(ValueIndexItem.Expression(Expression.Property(nameof(Data.DocumentBase.Type))),
		//															  ValueIndexItem.Expression(Expression.Property(nameof(Client.Key)))));

		// Order indexes
		//Database.CreateIndex("IX_Order_Client", IndexBuilder.ValueIndex(ValueIndexItem.Expression(Expression.Property(nameof(Data.DocumentBase.Type)).EqualTo(Expression.String(nameof(Order)))),
		//																ValueIndexItem.Expression(Expression.Property(nameof(Order.ClientId)))));
		//Database.CreateIndex("IX_Order_ClientContactId", IndexBuilder.ValueIndex(ValueIndexItem.Expression(Expression.Property(nameof(Data.DocumentBase.Type)).EqualTo(Expression.String(nameof(Order)))),
		//																		 ValueIndexItem.Expression(Expression.Property(nameof(Order.ClientContactId)))));
		//Database.CreateIndex("IX_Order_OrderNumber", IndexBuilder.ValueIndex(ValueIndexItem.Expression(Expression.Property(nameof(Data.DocumentBase.Type)).EqualTo(Expression.String(nameof(Order)))),
		//																	 ValueIndexItem.Expression(Expression.Property(nameof(Order.OrderNumber)))));
		//Database.CreateIndex("IX_Order_CurrentHistory", IndexBuilder.ValueIndex(ValueIndexItem.Expression(Expression.Property(nameof(Data.DocumentBase.Type)).EqualTo(Expression.String(nameof(Order)))),
		//																		ValueIndexItem.Expression(Expression.Property(nameof(Order.CurrentTaskId)))));

		// OrderHistory
		//Database.CreateIndex("IX_OrderHistory_OrderRowId", IndexBuilder.ValueIndex(ValueIndexItem.Expression(Expression.Property(nameof(Data.DocumentBase.Type))),
		//																		   ValueIndexItem.Expression(Expression.Property(nameof(OrderHistory.OrderRowId)))));
		Database.CreateIndex("IX_OrderHistory_CompleteDate", IndexBuilder.ValueIndex(//ValueIndexItem.Expression(Expression.Property(nameof(Data.DocumentBase.Type))),
																					 ValueIndexItem.Expression(Expression.Property(nameof(OrderHistory.CompleteDate)))));

		// User indexes
		//Database.CreateIndex("IX_User_ClientRowId", IndexBuilder.ValueIndex(ValueIndexItem.Expression(Expression.Property(nameof(Data.DocumentBase.Type))),
		//																	ValueIndexItem.Expression(Expression.Property(nameof(User.ClientRowId)))));
	}

blimkemann avatar Feb 01 '19 12:02 blimkemann

Try removing Type from the Type+RowID index. It may be inducing SQLite to choose that index because your query filters on that property.

snej avatar Feb 01 '19 17:02 snej

Here's the query, copied from the forum thread:


QueryBuilder.Select(SelectResult.Expression(Expression.Property(nameof(Order.RowId)).From("O")))
.From(DataSource.Database(Manager.MainDb.Database).As("H"))
.Join(Join.InnerJoin(DataSource.Database(Manager.MainDb.Database).As("O"))
On(Expression.Property(nameof(Order.CurrentTaskId)).From("O").EqualTo(Expression.Property(nameof(OrderHistory.OrderRowId)).From("H"))))
.Where(DatabaseBase.CreateWhereDocumentTypeEquals(nameof(OrderHistory), "H")
.And(Expression.Property(nameof(OrderHistory.CompleteDate)).From("H").IsNullOrMissing()))

Which give this using Explain():

SELECT fl_result(fl_value("O".body, 'RowId')) FROM kv_default AS "H" CROSS JOIN kv_default AS "O" ON (fl_value("O".body, 'CurrentTaskId') = fl_value("H".body, 'OrderRowId')) AND ("O".flags & 1) = 0 WHERE (fl_value("H".body, 'Type') = 'OrderHistory' AND (fl_value("H".body, 'CompleteDate') = x'' OR fl_value("H".body, 'CompleteDate') IS NULL)) AND ("H".flags & 1) = 0 0|0|0| SEARCH TABLE kv_default AS H **USING INDEX IX_User_ClientRowId (<expr>=?)** 0|1|1| SCAN TABLE kv_default AS O

I have these indexes created (among others):

Database.CreateIndex("IX_OrderHistory_CompleteDate", IndexBuilder.ValueIndex(ValueIndexItem.Expression(Expression.Property(nameof(Data.DocumentBase.Type))),
			                                                                             ValueIndexItem.Expression(Expression.Property(nameof(OrderHistory.CompleteDate)))));

			// User indexes
			Database.CreateIndex("IX_User_ClientRowId", IndexBuilder.ValueIndex(ValueIndexItem.Expression(Expression.Property(nameof(Data.DocumentBase.Type))),
																				ValueIndexItem.Expression(Expression.Property(nameof(User.ClientRowId)))));

Why isn't it choosing IX_OrderHistory_CompleteDate? The query never ever uses the UserClientRowId field!

snej avatar Feb 01 '19 17:02 snej

Remove all the other indexes worked just fine. But, how can I now look up a specific document by type/rowid which is the primary key (unique value) for each document? I also need to be able to do that.

blimkemann avatar Feb 01 '19 19:02 blimkemann

Yeah, this isn't a true workaround, but it gives another data point.

In general, the "primary key" of your data should be stored as the documentID. That's because

  • The documentID is the only thing that can be guaranteed unique
  • There is implicitly an index on it
  • There's a built-in API call to get a document given its documentID

If you have multiple primary keys, like (type, rowid), the convention is to concatenate them with some unambiguous delimiter like ":", "`,'", etc.

snej avatar Feb 01 '19 19:02 snej

The document id is the same as a sting concatenating my Type and RowId fields with an _ (e.g., "Type_Guidvalue"). So that is automatically created as a primary key?

blimkemann avatar Feb 01 '19 19:02 blimkemann

Yes, because Database.getDocument() needs to be fast.

(Internally, the documentID is the primary key of the SQLite table that stores the documents.)

snej avatar Feb 01 '19 20:02 snej

So that is the 'key' column in the kv_default table? And that is used in lieu of any other search criteria? In other words, that works like any other relational database primary key and thus doesn't need to be considered at all in our user code?

blimkemann avatar Feb 01 '19 20:02 blimkemann

Yup.

snej avatar Feb 01 '19 20:02 snej

Note: It doesn't appear that SQLite is choosing that index for any particular reason. It is just choosing whatever index was created last in the database. If I move the code to create the user indexes first then I get this.

Testing open tasks
kv_default_seqs
IX_TypeId
IX_Client_Key
IX_User_ClientRowId
IX_OrderHistory_OrderRowId
IX_OrderHistory_CompleteDate
SELECT fl_result(fl_value("O".body, 'RowId')) FROM kv_default AS "H" CROSS JOIN kv_default AS "O" ON (fl_value("O".body, 'CurrentTaskId') = fl_value("H".body, 'OrderHistory')) AND ("O".flags & 1) = 0 WHERE (fl_value("H".body, 'Type') = 'OrderHistory' AND (fl_value("H".body, 'CompleteDate') = x'' OR fl_value("H".body, 'CompleteDate') IS NULL)) AND ("H".flags & 1) = 0
0|0|0| SEARCH TABLE kv_default AS H USING INDEX IX_OrderHistory_CompleteDate (<expr>=?)
0|1|1| SCAN TABLE kv_default AS O

If I switch the order of the order history indexes I get this

Testing open tasks
IX_TypeId
IX_Client_Key
IX_User_ClientRowId
IX_OrderHistory_CompleteDate
IX_OrderHistory_OrderRowId
SELECT fl_result(fl_value("O".body, 'RowId')) FROM kv_default AS "H" CROSS JOIN kv_default AS "O" ON (fl_value("O".body, 'CurrentTaskId') = fl_value("H".body, 'OrderHistory')) AND ("O".flags & 1) = 0 WHERE (fl_value("H".body, 'Type') = 'OrderHistory' AND (fl_value("H".body, 'CompleteDate') = x'' OR fl_value("H".body, 'CompleteDate') IS NULL)) AND ("H".flags & 1) = 0
0|0|0| SEARCH TABLE kv_default AS H USING INDEX IX_OrderHistory_OrderRowId (<expr>=?)
0|1|1| SCAN TABLE kv_default AS O

I don't think SQLite has enough information to make an educated choice about which index to use

borrrden avatar Feb 21 '19 21:02 borrrden

Also of note is that IX_OrderHistory_CompleteDate appears in the extended explain output, but no other index does

e.g.

[
		5,
		"ReopenIdx",
		2,
		10,
		0,
		"k(3,,,)",
		"02",
		"root=10 iDb=0; IX_OrderHistory_CompleteDate"
	],

borrrden avatar Feb 21 '19 21:02 borrrden

So is this a case where being able to specify an index hint would be useful? I think so - sometimes we are smarter than the database.

blimkemann avatar Feb 21 '19 21:02 blimkemann

I'm not sure anymore. I wonder if the explain output is actually correct. The extended output was from the original setup which reported using CompleteDate as it should, while explain said that it was using ClientRowId.

borrrden avatar Feb 21 '19 21:02 borrrden

Migrated to CBL-69

borrrden avatar Jun 06 '19 00:06 borrrden

Whoops, keeping this one open so that third party comments can still come in.

borrrden avatar Jun 06 '19 00:06 borrrden

Any plans on when this feature would be implemented? I would like to create partial indexes for certain long-running queries (Using a Where condition and Order by), something like this:

IndexBuilder.ValueIndex(
  ValueIndexItem.Expression(Expression.Property("type").EqualTo(Expression.String("contact"))), 
  ValueIndexItem.Expression(Expression.Not(Expression.Property("isDeleted"))),  
  ValueIndexItem.Property("tenantId"),   
  ValueIndexItem.Property("status"),  
  ValueIndexItem.Property("creationDate")) // creationDate DESC
); ```

bruno-david avatar Nov 05 '19 17:11 bruno-david

It would need to be prioritized by our project management. That's driven in part by customer demand, so if you're a Couchbase customer you should definitely let your SE know this feature is important to you.

snej avatar Nov 05 '19 21:11 snej

Ok, thanks for the information. I will contact them.

bruno-david avatar Nov 06 '19 20:11 bruno-david

Maybe not relate to this issue, but I got a performance issue with the multiple joins (my query joins 8 docs) and it takes about 3 or 4 seconds, I know we shouldn't join many document in NoSQL, but I have no other choice. So I'm checking a temporary solution is to delete all existing index and recreating again (I see c4db_maintenance(db, kC4Reindex, &err) doesn't work), this temp solution worked now a query only takes 0.5 s !!! but only in release mode build so I don't understand why this doesn't work in debug mode build. Do you have any ideas about difference?

stephenlang84 avatar Feb 01 '21 09:02 stephenlang84