Dexie.js icon indicating copy to clipboard operation
Dexie.js copied to clipboard

Bulk query - performance report

Open ladi-j opened this issue 4 years ago • 12 comments

Hi there. I've noticed that you've requested real-life performance numbers, so I'd like to share my research when hunting for the problem of our slow queries.

DATABASE SETUP

1 Database loaded in browser, multiple tables with data items ranging from 10 items to 20000. WebPack compiling to ES2017 target. Our web app is built on React + Redux stack, using Sagas, hooks and async selectors. I can share more info if requested, but it's not very important as I've tested just the dexie performance itself (see below)

Selector tested on a table with 447 items. Item type:

export type RawCodingItem = {
  name: string,
  codes_to_meanings: CodesToMeanings, // [code: meaning]
  codes_to_concepts?: CodesToMeanings, // [code: concept]
  display?: DisplayHierarchy // Display order
}

HW & SW

  • Chrome Version 87.0.4280.67 (Official Build) (x86_64) running anonymous window
  • MacBook Pro (13-inch, 2019, Four Thunderbolt 3 ports)
  • Node v12.7.0
  • Dexie 3.0.0-rc7

USE CASE

Select multiple coding items via the primary key. We're selecting about 90% of coding items in our case.

CONTROLLED ENVIRONMENT

  • All re-runs were done with exactly the same state of application, with the same database
  • all API calls were finished
  • no JS was processing in the background
  • no repaints or any other DOM manipulation took place
  • time measured in the Dexie selector itself (so it's not affected by rest of the stack) with new Date().getTime() before and after the selector itself (using await)

RESULTS

SINGLE RUN

(only one selector fired)

Promise.all(name => db.codings.where({name})) - 6836 ms -

const codingObject = await Promise.all(names.map(name => db.codings.where({name}).toArray()));
const filteredCodings = codingObject.flat();
  • 6689 ms
  • 7183 ms
  • 6582 ms
  • 5972 ms
  • 7758 ms

bulkGet.toArray().filter() - 2318 ms -

const codingObject = await db.codings.bulkGet(names);
const filteredCodings = codingObject.filter(({name}) => names.includes(name));
  • 2076 ms
  • 2165 ms
  • 3083 ms
  • 2085 ms
  • 2183 ms

Collection.where().anyOf() - 167 ms -

const filteredCodings = await db.codings
  .where("name")
  .anyOf(names)
  .toArray();
  • 134 ms
  • 244 ms
  • 142 ms
  • 173 ms
  • 141 ms

Collection.filter().toArray() - 140 ms -

const filteredCodings = await db.codings.filter(({name}) => names.includes(name)).toArray();
  • 181 ms
  • 121 ms
  • 158 ms
  • 123 ms
  • 116 ms

Collection.toArray().filter() - 92 ms -

const codingObject = await db.codings.toArray();
const filteredCodings = codingObject.filter(({name}) => names.includes(name));
  • 83 ms
  • 114 ms
  • 83 ms
  • 84 ms
  • 96 ms

CONCURRENT RUN

(2 instances of the same selector fired at the same time)

Promise.all(name => db.codings.where({name})) - 14580 / 10851 ms -

  • 13934 / 10402 ms
  • 15146 / 11473 ms
  • 14660 / 10678 ms

bulkGet.toArray().filter() - 5030 / 4262 ms -

  • 4523 / 4220 ms
  • 5948 / 4166 ms
  • 4620 / 4399 ms

Collection.where().anyOf() - 295 / 968 ms -

  • 245 / 835 ms
  • 266 / 974 ms
  • 351 / 1018 ms
  • 358 / 1008 ms
  • 255 / 1005 ms

Collection.filter().toArray() - 263 / 879 ms -

  • 312 / 899 ms
  • 259 / 854 ms
  • 296 / 948 ms
  • 211 / 773 ms
  • 240 / 910 ms

Collection.toArray().filter() - 138 / 774 ms -

  • 85 / 735 ms
  • 124 / 768 ms
  • 182 / 823 ms
  • 206 / 813 ms
  • 94 / 732 ms

CONCLUSION

Getting all the items from dexie with toArray() and then filtering them with JS .filter function was the fastest in both single and double instance selector. Second fastest in both cases was the Collection.filter().toArray() method.

Interesting observation is that running two selectors on the same table at the same time causes huge performance problems, so this might be a good start for everyone struggling with UI freeze. Thanks to this I've actually discovered that we were doing exactly that by accident :D

ladi-j avatar Dec 14 '20 04:12 ladi-j

Note: I will add one more performance report tommorow. We have a selector querying 20000 objects and then filtering them. Currently it takes ~500ms for DB query and another ~500ms for JS parsing.

Will look into this further

ladi-j avatar Dec 14 '20 04:12 ladi-j

Thanks a lot @Ruskyj! It would be a great benefit if you could share a link to the source code also. I'd like to trace down the reasons of performance issues with Promise.all(name => db.codings.where({name})). In the long term I'd also like to have this test scriptable and make performance reports for both old and future releases.

dfahlander avatar Dec 14 '20 10:12 dfahlander

Access to the source code won't be possible unfortunately, as this is a client's project behind NDA. I can share some extra snippets if needed (our Dexie -> Redux selectors for example).

ladi-j avatar Dec 14 '20 14:12 ladi-j

@dfahlander I've run another set of tests with much bigger table. I've also tried to upgrade to Dexie 3.0.3, with no measurable effect on performance. If you're interested, I can prepare an example of our selector stack (so I won't break the NDA).

DEXIE Fields table run (19627)

Query for 19559 fields (99%), Querying by secondary key

Same app, same database, same environment and testing riles

Table item:

export type RawFieldItem = {
  // Field detail
  is_hierarchical: boolean,
  is_multi_select: boolean,
  is_primary_key: boolean,
  is_sparse_coding: boolean,
  name: string,
  primary_table_name: string,
  serial_type: FIELD_TYPE,
  title: string,
  type: FIELD_TYPE,

  // Optional keys
  coding_name?: string,
  concept?: string,
  description?: string,
  folder_path?: string[],
  linkout?: string,
  longitudinal_axis_type?: "primary" | "secondary"
  units?: string,
  mapping?: {
    column: string,
    column_sql_type: string,
    database_id: string,
    database_name: string,
    database_unique_name: string,
    table: string,
  },
  referenced_entity_field?: {
    entity: string,
    field: string,
    database_id: string,
    database_name: string,
    database_unique_name: string,
    table: string,
    column: string,
  }
  stats?: {
    min: number,
    max: number,
    avg: number,
    null_count: number,
  }
}

Results

Promise.all(name => db.codings.where({name})) - N/A ms -

  • Not applicable, as we’re filtering by one key value only

bulkGet.toArray().filter() - N/A ms -

  • Not applicable, as required filter is not a primary key

Collection.where().equals() - 423 ms -

  • 420 ms
  • 426 ms
  • 417 ms
  • 423 ms
  • 431 ms

Collection.filter().toArray() - 1614 ms -

  • 1457 ms
  • 1386 ms
  • 1745 ms
  • 1854 ms
  • 1627 ms

Collection.toArray().filter() - 339 ms -

  • 351 ms
  • 339 ms
  • 342 ms
  • 330 ms
  • 335 ms

CONCLUSION

Similar results as with smaller table. Querying everything and then filtering it with vanilla JS is still the fastest. Interesting change was a big performance drop of Collection.filter method when dealing with bigger table.

ladi-j avatar Dec 15 '20 00:12 ladi-j

Number for Collection.where().equals() surprice a bit. But it doesn't say so much without knowing how the code is executed, awaited and whether it's in a transaction or not, how large the queried indexes are, etc.

Thanks anyway for sharing the numbers! We will need to create a performance test for Dexie where we have the entire preconditionset - source and data, and continue from there.

dfahlander avatar Dec 15 '20 07:12 dfahlander

@dfahlander I've tried to put our whole selector stack (from DB to React) in this example. Business logic was removed as it's not important for tests. All performance results were performed in the DB selector itself (see comments).

Let me know if this helps you, or I'm doing something really bad here :D

EXAMPLE STACK

// Database constructor
class DeepStateDB extends Dexie {
  public fields!: Dexie.Table<FieldItem, string>;
  public folders!: Dexie.Table<FolderHierachicalItem, DATASET_FIELD_TYPES>;
  public codings!: Dexie.Table<RawCodingItem, string>;
  public config!: Dexie.Table<ConfigItem, CONFIG_TYPES>;
  public entities!: Dexie.Table<EntityItem, string>;


  constructor(recordID) {
    super(recordID);
    const db = this;

    db.version(VERSIONS.DATABASE).stores({
      fields: "[entity+name], database, entity, name",
      folders: "type",
      codings: "name",
      config: "type",
      entities: "name",
    });

    this.fields = this.table("fields");
    this.folders = this.table("folders");
    this.codings = this.table("codings");
    this.config = this.table("config");
    this.entities = this.table("entities");
  }
}


// Database acessor
export const DeepStateRecord = (recordID: string): DeepStateDB => new DeepStateDB(recordID);


// Unviversal selector
async function createDBselector<S extends(...args: any[]) => any>(
  recordID: string | undefined,
  selectorFunc: S,
  ...args
): Promise<ReturnType<S>> {
  if (!recordID) throw new Error("Can't open a database connection without record-id!");

  const db = DeepStateRecord(recordID);
  const selectorResult = await selectorFunc(db, ...args);

  return selectorResult;
}


// Specific combined selector
export async function asyncSelectPrimaryPhenoField(
  db: DeepStateDB,
): Promise<FieldItem | undefined> {
  /* ---------------- PERFORMANCE WAS MEASURED HERE ---------------- */
  const primaryFieldPath = await asyncSelectPrimaryPhenoFieldPath(db);
  const primaryField = await dbSelectSingleField(db, primaryFieldPath);
  

  return primaryField;
}


// DB selector
export async function dbSelectSingleField(
  db: DeepStateDB,
  fieldPath: FieldID,
): Promise<FieldItem | undefined> {
  const singleField = await db.fields
    .where(["entity+name"])
    .equals(fieldPath)
    .first();

  return singleField;
}



/* ********* EXAMPLES OF USAGE ********* */
// Sagas
function* exampleSaga(recordID) {  
  const result = yield call(createDBselector, recordID, asyncSelectPrimaryPhenoField);
  return result;
}

// Classic async
async function exampleAsync(recordID) {
  const result = await createDBselector(recordID, asyncSelectPrimaryPhenoField);
  return result;
}

// Using custom fork of createSyncSelector from "async-selector-kit" with small changes to redux dispatch
export const [selectResult, selectResultLoading] = createAsyncSelectorWithDispatch(
  {
    async: async(recordID) => createDBselector(recordID, asyncSelectPrimaryPhenoField),
    id: "componentSelectorExample",
  },
  [
    selectRecordID,
  ],
);

ladi-j avatar Dec 15 '20 14:12 ladi-j

Is a new Dexie instance created for every query? If so it could explain a longer response time due to the time it takes for the browser to open the database. It is recommended to have a single instance of Dexie in your app so that it only needs to open on page load. It would also explain why parallel queries would be slow as database opening can only be done one at a time.

dfahlander avatar Oct 25 '21 12:10 dfahlander

I believe we only create one instance per page mount (as we define it once and then use that instance to access any database selector).

Is there an easy way how to test it/count the instances?

ladi-j avatar Oct 25 '21 12:10 ladi-j

The number of instances can be counted using Dexie.connections.length. It's strange that a plain equals- request takes seconds and not milliseconds. In this fiddle we get a completely different performance, so something must be very different.

dfahlander avatar Oct 25 '21 12:10 dfahlander

It's possible that our abstraction layer is creating some sort of bottleneck. I'll test the number of connections and report back.

Whole implementation is outlined in the "example" stack comment. Please let me know if you see any possible issues straight away.

ladi-j avatar Oct 25 '21 13:10 ladi-j

Any updates on this?

muhamedkarajic avatar Mar 06 '23 07:03 muhamedkarajic

I guess this issue is related.

muhamedkarajic avatar Jun 24 '23 11:06 muhamedkarajic