Dexie.js icon indicating copy to clipboard operation
Dexie.js copied to clipboard

Compounding .where() and .orderBy()

Open peterbe opened this issue 7 years ago • 18 comments

A Table has .orderBy() and returns a Collection. A table also has .where() and returns a Collection. How do I combine the two?

For example, this works:

let table = db.issues
let collection = table.orderBy('updated_at')
// optionally I can add the chainable `collection = collection.reverse()`

And this works:

let table = db.issues
let collection = table.where('key').anyOf('a', 'b')

But this doesn't works:

let table = db.issues
let collection = table.where('key').anyOf('a', 'b')
collection = collection.orderBy('updated_at')

peterbe avatar Aug 01 '16 18:08 peterbe

Ideally, I should be able to chain freely, like in the Django ORM:

let table = db.issues
let results = table
if (some condition) {
  results = results.where(...).equals(...)
}
results = results.orderBy('somekey').reverse().limit(10).offset(20)
if (other condition) {
  // more filtering!
  results = results.where(...).anyOf(...)
}
// eventually, send the query definition to the database
results.toArray().then(records => ...)
// and I should be able to make a count too
results.count().then(number => ...)

peterbe avatar Aug 01 '16 18:08 peterbe

True, but the implementation would have to choose whether to use the index for sorting or filtering. With Dexie (as of current version at least), you will have to make that choice yourself and not rely on a qualified performance guess done by the system. An SQL system that gets that kind of query will make a choice whether to use the index for sorting or filtering. If sorting is choosed, it will do a full-table scan sorting with the index. If filtering is choosed, it will sort the resulting array after retrieval. With Dexie you will have to do the choice yourself and choose a strategy depending on how narrowing the filter is, and how large the result set would be. If the filter is narrow and the result set is small, choose where() to utilize the index for filtering, combined with sortBy() on the collection (or just sort the resulting array using Array.sort()). If the result set is large or filter is broad, you would probably be better of using orderBy() to utilize the index for sorting and combine with Collection.filter() for filtering. If you use limit() on the Collection, this is also normally the best choice.

That said. I am tempted to change the next major version of Dexie so that there is only a Collection class and no Table class, with possibility to chain where() and orderBy() clauses as with real ORMs. Today Dexie pretty much maps to the indexedDB API with not that much magic put on top of it.

dfahlander avatar Aug 01 '16 22:08 dfahlander

Thank you for the answer.

Unfortunately, it means I can't use Dexie to combine where, limit and orderBy. Granted my database isn't enormous but if I pick orderBy I have to filter out almost ALL other records in memory. If I chose where I have to first sort ALL other records in memory. Either way, if I have 10,000 records in the database (and 1,000 belonging to some constraint) and I only want the top 10 (order by something), I would have to extract 1,000 records from the database do the sortBy or filter, show the top 10, then "drop" the other 990 objects extracted.

I'm not sure I have the capacity to help the project but I would LOVE for there to be an ability to do ordering AND constraints in 1 query.

peterbe avatar Aug 02 '16 10:08 peterbe

If you only want the top X items you'd normally be better off using the index for sorting and filtering out the data in memory. That way your performance wont be affected as the database size increases. If I'd ever implement the possibility to combine orderBy() with where(), I'd initially build a simple query planner that would always utilize the query for sorting rather than filtering in case Collection.limit() was used on the collection.

dfahlander avatar Aug 22 '16 13:08 dfahlander

Hello David,

Thank you for the awesome job with both building Dexie and maintaining the "community" here and on stackoverflow. I am also wondering when might the ability to combine where with orderBy/sortBy be coming to Dexie?

igorshmukler avatar Sep 28 '17 06:09 igorshmukler

I'm sorry to say that I cannot give an estimate for that, though I wish I could. If the Collection class where rewritten to be more abstract (not tied to indexedDB operations), we could do such things on it as orderBy(). That is part of the "Vision for Dexie". If everything goes as plan, I will be able to release some new architecture in the winter/spring, including allowing arbritary backends (to use the Dexie API towards a postgres database for example). But as the project is not yet financed at all, these are just hopes and visions so far.

The work done so far on this is in the branch next-expression-engine but is still totally untested.

dfahlander avatar Sep 28 '17 09:09 dfahlander

Hi David, 1. Does Dexie do Case-insensitive sorting 2. Does Dexie supports contains/indexOf 3. Why does Dexie processing latest request while cancelling previous pending request. For Instance : while searching on dexie db with search string "p" if this request is under process meantime if search string updated with "pe" then previous request("p") cancelled and updated search string request with "pe" is processing and it returns single response which is related to "pe" but i supposed to get two response one is for "p" and second is for "pe" NOTE : This issue cause live search inaccurate result.

Is there any workaround? 

     

pugazhendhisp avatar Aug 10 '18 14:08 pugazhendhisp

Is this still not possible?

aprixon01112017 avatar May 13 '19 15:05 aprixon01112017

@aprixon01112017 The state of Dexie 3 is still that you need to choose what you want to use the index for - if used for sorting, use orderBy(), if used for filtering, use where(). If you build up a compound index, you may combine the two by having the second part representing the orderBy while using the first part in your query, such as querying all friends named 'Matthew' and order the result by their age: where('[name+age]').between(['Matthew', Dexie.minKey], ['Matthew', Dexie.maxKey]). However, this works only for equality comparision, and the way to use it is not obvious.

The plan is to release a new major (Dexie 4) at some time, with a lot of news. One is that the query engine will accept arbitrary filter-expressions and ordering, even when it cannot utilize an index. Pretty much the same approach as other databases use (SQL / MongoDB, etc) and try to do a best-effort choice automatically instead. I cannot promise the time schedule for this as I'm working as a consultant most part of my work time, so it has taken a great deal of time to complete the design of this new API, but I feel I've landed in a great API now that I'm looking much forward to get the time to release. Some of the news I'm planning for, are:

  • React Suspense support
  • No separation between Table and Collection.
  • Nested where clauses and allow orderBy on Collection.
  • Let Collections be compatible with ES-Observable and have a subscribe() method, also let any database querying expression be observable.
  • Will use an API that is less IndexedDB-specific and can be implemented using other resolvers.
  • The possibility to implement a server for it and have the two support conflict-free sync (Strong Eventual Consistency).
  • Easier typescript integration
  • Simplified migration
  • Support for arbritary expressions (complex, nested expressions, partially mongodb compatible)
  • Better testability (in-memory resolver for the API)
  • Collated indexes (getting a collated sort order by using the index)

dfahlander avatar May 13 '19 23:05 dfahlander

I'm trying this aproach: https://dexie.org/docs/Collection/Collection.offset()#paged-or-queries

It says that "Will have equal performance for page 1 and page 100."

But I can't understant how that is possible if I need go straith to a given page.

How to get the lastEntry without run by all the pages until the one I want?

CassianoSF avatar Aug 05 '19 13:08 CassianoSF

That piece of code in the docs is conceptual. It represents a way of using the index for sorting on other key than the indexes used for filtering. I don't think it scales for result sets larger than 100,000 (or less) as the Set of primary keys needs to fit in memory all the time, but the time to retrieve next page would always be constant. Given that you have the lastEntry from the previous page, you will get the same amount of data fetching to retrieve the next page, but you cannot start at page 100 since you wouldn't have the lastEntry from page 99 unless you've navigated there before.

  1. Get all primary keys from the evaluation of the expression.
  2. Put those keys into a Set
  3. Enumerate the index from prevKey and forward until page is filled. get the actual value for each primary key in the page.

So basically there are two data fetchings which would be equal all the time:

  1. Getting all the primary keys for the entire result.
  2. Enumerating index from a certain key and forward.

There was also a bug in the snippet (and might be others as the code isn't tested): pageKeys.length === PAGE_SIZE shoud be () => pageKeys.length === PAGE_SIZE. I just updated the page to fix that.

dfahlander avatar Aug 05 '19 15:08 dfahlander

Right. So, I think I'm stuck with a memory bloat issue, but thanks for the clarification!

CassianoSF avatar Aug 05 '19 16:08 CassianoSF

Hi @dfahlander, thanks for giving us Dexie.js, it is totally amazing. Just wanted to check on v4 plan

The plan is to release a new major (Dexie 4) at some time, with a lot of news.

Is there any plan in near future for it?

jainkuniya avatar Oct 17 '22 09:10 jainkuniya

Hi @dfahlander, thanks for giving us Dexie.js, it is totally amazing. Just wanted to check on v4 plan

The plan is to release a new major (Dexie 4) at some time, with a lot of news.

Is there any plan in near future for it?

I originally had the plan to release it at the end of 2022 but my current focus is getting dexie-cloud up to production so dexie@4 release will have to be delayed. It is possible to install the current beta using npm install dexie@next - it's an easy and backward compatible upgrade, but it can be disappointing since most features mentioned in the road map are not in place yet. My current guess would be it be fully implemented at some point in the summer or autumn 2023.

dfahlander avatar Oct 18 '22 06:10 dfahlander

@dfahlander related to this suggestion you made above:

If you build up a compound index, you may combine the two by having the second part representing the orderBy while using the first part in your query, such as querying all friends named 'Matthew' and order the result by their age: where('[name+age]').between(['Matthew', Dexie.minKey], ['Matthew', Dexie.maxKey]). However, this works only for equality comparision, and the way to use it is not obvious.

I am in a situation where this is useful to me and I would like to share the code with others in hopes it will help them, too, and with you in case you spot an issue with it I might have missed.

I need where+orderBy because in my case I am paging the database table and optionally skipping entries which are marked a certain value in one of their cells. I used an index on that column to be able to say where('isValid').notEqual(skipValid ? 'true' : ''), but with this code, the order of the items returned was not the same as the order in which they were loaded in the database (they were alphabetical based on the indexed cell's value).

At your suggestion I used a compound index and used the primary key as the second component so that the entries would always come out in the original order. I have prototype a solution based on this and it seems pretty fast for my use case. Here it is with 1k rows, both lookups performing well. In production I am working with datasets of 100k to 1M. That's why I am keen on not doing the sorting on the main thread. I tried with those numbers and it seems decent as well! It seems to me with this compound index main thread sorting is avoided? The performance seems to suggest so at least…

https://stackblitz.com/edit/dexie-playground-strc7n?file=index.ts

import { Dexie } from 'dexie';

const db = new Dexie(Date.now().toString());
db.version(1).stores({
  test: '++id,[isValid+id]',
});

void (async function () {
  try {
    for (let index = 0; index < 1000; index++) {
      await db['test'].add({ index });
    }

    console.log(await db['test'].count());

    console.log('preview before changes:');
    for (const entry of await db['test'].limit(5).toArray()) {
      console.log(entry);
    }

    await db['test']
      .toCollection()
      .modify((entry) => (entry.isValid = (entry.index % 2 === 0).toString()));

    console.log('preview after changes:');
    for (const entry of await db['test'].limit(5).toArray()) {
      console.log(entry);
    }

    function getEntries(skipValid: boolean) {
      let query = db['test'];
      if (skipValid) {
        query = query
          .where('[isValid+id]')
          .between(['false', Dexie.minKey], ['false', Dexie.maxKey]);
      }

      return query;
    }

    const invalidStamp = Date.now();
    console.log('preview of invalid ascending:');
    for (const entry of await getEntries(true).limit(5).toArray()) {
      console.log(entry);
    }
    console.log('====>', Date.now() - invalidStamp, 'ms');

    const invalidReductionStamp = Date.now();
    console.log('reduction of invalid ascending:');
    const invalidStats = { valid: 0, invalid: 0 };
    await getEntries(true).each((entry) => {
      if (entry.isValid === 'true') {
        invalidStats.valid++;
      } else {
        invalidStats.invalid++;
      }
    });
    console.log(invalidStats);
    console.log('====>', Date.now() - invalidReductionStamp, 'ms');

    const allStamp = Date.now();
    console.log('preview of all ascending:');
    for (const entry of await getEntries(false).limit(5).toArray()) {
      console.log(entry);
    }
    console.log('====>', Date.now() - allStamp, 'ms');

    const allReductionStamp = Date.now();
    console.log('reduction of all ascending:');
    const allStats = { valid: 0, invalid: 0 };
    await getEntries(false).each((entry) => {
      if (entry.isValid === 'true') {
        allStats.valid++;
      } else {
        allStats.invalid++;
      }
    });
    console.log(allStats);
    console.log('====>', Date.now() - allReductionStamp, 'ms');
  } catch (error) {
    console.log(error.message ?? error);
  }
})();

TomasHubelbauer avatar Jul 21 '23 13:07 TomasHubelbauer