UniqueIndex query performance improvement
Opened for discussion. Please see details in pull request #105
Merged!
I like that pull request, nice work!
I made some minor adjustments:
- You had a shortcut to delegate directly to the UniqueIndex if possible, on the first line of the CollectionQueryEngine.retrieve() method. However, this would bypass the code which orders results. As the UniqueIndex supports in() queries as well, it can sometimes return more than one object in the ResultSet, so I think we still need to order results. However the other place where you delegate to the UniqueIndex in getResultSetWithLowestRetrievalCost(), should cover that. The question is what the performance would be like. Can you re-test it?
- I moved the code which performs instanceof checks, into the UniqueIndex.supportsQuery method, as this is the purpose of that method. I overrode the superclass implementation to ensure performance should be the same.
- In the addAttributeIndex() method, I moved the new code for UniqueIndex code outside of an if() block, as there was an edge case where it would only work if the UniqueIndex was the first index added.
Can you re-test with the latest code?
It's now Ok, but not perfect :)
I've updated my tests a bit here in case you'll need it.
Let's consider following timings
| Benchmark | Average | Units | Description |
|---|---|---|---|
| HashTest.queryHash | 0.530 | us/op | Top time I'd like to get |
| QueryTest.queryHash | 1.141 | us/op | Before the fix |
| QueryTest.queryHash | 0.837 | us/op | Compiled from current master. Current state |
| QueryTest.queryHash | 0.679 | us/op | Change-1 (see below) |
| QueryTest.queryHash | 0.540 | us/op | Change-2 (see below) |
-
Change-1 Add shortcut for Equals query and uniqueIndexes in CollectionQueryEngine#retrieve as below:
if(query instanceof Equal) { Index<O> uniqueIndex = uniqueIndexes.get(((Equal)query).getAttribute()); if (uniqueIndex != null && uniqueIndex.supportsQuery(query, queryOptions)) { return uniqueIndex.retrieve(query, queryOptions); } }This saves 0.140 us/op or 19% saving. I guess it's safe, as it we do not need any queryOptions processing for the single Equals query.
-
Change-2 Comment out two first lines in ConcurrentIndexedCollection#retrieve
public ResultSet<O> retrieve(Query<O> query, QueryOptions queryOptions) {
final QueryOptions queryOptionsOpened = queryOptions;
// final QueryOptions queryOptionsOpened = openRequestScopeResourcesIfNecessary(queryOptions);
// flagAsReadRequest(queryOptionsOpened)
It's definitely would not work now, but would save another 0.140 us/op and we will be pretty close to HashTest.queryHash timings. Those two lines commented actually put two values to query options hash ad flags:
...
queryOptions.put(Persistence.class, persistence);
...
FlagsEnabled.forQueryOptions(queryOptions).add(PersistenceFlags.READ_REQUEST);
And this is rather expensive for this kind of queries. I guess It's unfixable in current version. I'd have in QueryOptions fields instead of storing in hash flags and options that are known at compile time. This would definitely faster. For now I'm just checking those option and flag already present in the query - it's a bit faster than putting them to map. This will finally give 0.594 us/op that is only 11% slower than hash. I'd say it's perfect for now. Will commit in a minute.
Add pull request https://github.com/npgall/cqengine/pull/107 Please review.
Thank you for your time.
Thanks for the additional suggestions.
Re: Change 1 - this would need to be in the retrieveRecursive() method instead of the retrieve() method, in order for it to work with nested queries.
Re: Change 2 - I created an experimental branch to see if we could measure the performance difference by not allocating HashMaps in QueryOptions. It's a work in progress: https://github.com/npgall/cqengine/blob/optimize-query-options/code/src/main/java/com/googlecode/cqengine/query/option/QueryOptions.java
Re: Re: Change 1 - this would need to be in the retrieveRecursive() method instead of the retrieve() method, in order for it to work with nested queries
nested queries works well as it's properly handled in getResultSetWithLowestRetrievalCost() which is called from retrieveRecursive() adding the check for Equals in retrieve() is a shortcut to bypass any other checks and return result set as fast as possible.
I've experimented further based on the idea from your optiize-query-options branch. Below is a benchmark values for different options. Note, I'm experimenting on different hardware and values are incomparable to above. Also note that I've added another test that do not cache query options between executions.
| Benchmark | Caching | nonCaching | Units | Description |
|---|---|---|---|---|
| HashTest.queryHash | 0.207 | 0.207 | us/op | This is new top time to compare. Note it's 2.5 times faster than above |
| QueryTest.uniqueQuery | 0.768 | 0.897 | us/op | Equals query test for cqengine 2.8.0 |
| QueryTest.uniqueQuery | 0.520 | 0.630 | us/op | Compiled from current master. Current state |
| QueryTest.uniqueQuery | 0.422 | 0.562 | us/op | check for Equals in retrieve and check options before queryOptions.put |
| QueryTest.uniqueQuery | 0.406 | 0.424 | us/op | Completely reworked QueryOption and FlagsEnabled to use properties instead of hashes. All changes in my branch HERE |
As you could see, latest commits most win when not cache query option object. I have not investigated why I still have 2 times difference compared to HashTest on a better hardware.
We can't represent all of the flags as an enum actually, because we need to allow for extensibility. Several companies using CQEngine (mine included) have their own implementations of indexes and queries, and we need a way to allow index-specific flags to be set, where we don't know the keys in advance.
I have been reconsidering the approach I took in the branch actually. Mainly because of backward compatibility and the number of API changes involved.
I think another option may be to provide a QueryOptions.reset() method, to allow QueryOption objects to be recycled between requests instead. The reset() method would remove request-specific parameters, but would leave existing but empty HashMap and FlagsEnabled objects in place. This way we could avoid the need for new objects to be allocated each time.
I'll see if I can create a branch to test this.
I agree about compatibility, but I'm afraid it's hard to get top values with hash (with reset as well). Note also, using hash, has additional GC pressure (with reset() as well) - each key, value pair wrapped with new HashEntry object. They do probably stay in young gen and be GCed relatively fast, but no gc is better than any gc. Guess you'll get values close to line#3->nonCaching in the table above (corrected to your hardware) with reset().
That's a good point about HashEntry/Node objects. I'll prepare a different branch so we can test it. It would be nice to find a way to avoid the HashEntry allocations.
EDIT: relevant: http://stackoverflow.com/questions/23828425/is-there-a-hashmap-implementation-in-java-that-produces-no-garbage
Just a heads up that I've released CQEngine 2.9.1 which includes your first pull request and some of these performance improvements (many thanks!). I'll keep this issue open until I/we've had time to look at the other optimizations in this issue as well.
Thank you on update and the time you spend on cqengine.