deno-sqlite icon indicating copy to clipboard operation
deno-sqlite copied to clipboard

IO is extremely slow

Open tuxcrafting opened this issue 2 years ago • 7 comments

Running this bench

import { DB } from 'https://deno.land/x/[email protected]/mod.ts';

function insert(db: DB, s: string): void {
    db.query('INSERT INTO tbl ( value ) VALUES ( :value )', { value: s });
}

function bench(filename: string, n: number = 10000): void {
    const db = new DB(filename);
    db.query('CREATE TABLE tbl ( id INTEGER PRIMARY KEY AUTOINCREMENT, value TEXT )');
    console.log(`start ${filename} w/ ${n} insertions`);
    const a = Date.now();
    for (let i = 0; i < n; i++) {
        insert(db, `${i}`);
    }
    const b = Date.now();
    const dt = b - a;
    console.log(`total ${dt} ms, ${dt / n} ms per query`);
    db.close();
}

bench(':memory:');
bench('test.db', 10);

Results in

start :memory: w/ 10000 insertions
total 625 ms, 0.0625 ms per query
start test.db w/ 10 insertions
total 1322 ms, 132.2 ms per query

When profiling, it appears that the majority of the time is spent within Deno internals, namely opSync, so I don't know if this fixable on the library end, though.

tuxcrafting avatar Mar 01 '22 06:03 tuxcrafting

This might be related to #165, are your running your tests on a Mac?

dyedgreen avatar Mar 05 '22 17:03 dyedgreen

Nope, this is Fedora Linux.

tuxcrafting avatar Mar 05 '22 19:03 tuxcrafting

Hm yeah. Not sure what we can do outside of making deno itself faster? We could try to buffer writes internally and only go to deno when sqlite requests a sync? (We’d definitely have to benchmark this though)

dyedgreen avatar Mar 06 '22 10:03 dyedgreen

We could try to buffer writes internally and only go to deno when sqlite requests a sync?

This would be great. I have a project where batching inserts in transactions is impractical due to the values of insertions being dependent on querying the values which came before it. I'd love not to have to write my own buffer abstraction.

sgwilym avatar May 18 '22 13:05 sgwilym

I can reproduce similar times on my Windows 10 laptop with an HDD (~0.03 ms per query in-memory and ~170 ms per query writing to disk).

For comparison, I wrote a (hopefully) equivalent benchmark for python's built-in sqlite3:

import sqlite3
import time
import os

def insert(db, s):
    db.execute('INSERT INTO tbl ( value ) VALUES ( :value )', { "value": s })

def bench(filename, n = 10000):
    db = sqlite3.connect(filename, isolation_level=None);
    db.execute('CREATE TABLE tbl ( id INTEGER PRIMARY KEY AUTOINCREMENT, value TEXT )')

    print(f'start {filename} w/ {n} insertions')
    a = time.perf_counter()
    for i in range(n):
        insert(db, f'{i}')
    b = time.perf_counter()
    dt = (b - a) * 1000
    print(f'total {dt} ms, {dt / n} ms per query')
    db.close()

os.remove('test.db')

bench(':memory:');
bench('test.db', 10)

isolation_level=None should enable autocommit, which as far as I understand matches the default behavior of x/sqlite.

With this benchmark in python I got times of ~0.006 ms per query in-memory and ~170 ms per query writing to disk. Which is a practically identical time for the IO bound case. This seems to suggest that while the IO is a bottleneck it is not a problem with this specific sqlite library or deno but rather a general issue running sqlite in autocommit mode without WAL (it is worth noting that if I understood the documentation correctly autocommit is disabled by default on python's sqlite3 library).

For reference, enabling WAL in python reduced the time of this benchmark to an average of ~40 ms per query on my laptop.

FeldrinH avatar Jun 10 '22 19:06 FeldrinH

Interesting! Enabling WAL is tricky, since that requires memory-mapping the file; (although we might be able to make it work by hacking around with a shared WASM buffer? But it seems tricky to get right while preserving the reliability guarantees of SQLite.

dyedgreen avatar Jun 10 '22 20:06 dyedgreen

It might be possible to enable WAL, if we use EXCLUSIVE locking: https://sqlite.org/wal.html#noshm (given that we don’t have a good way to get mapped files in deno for now)

dyedgreen avatar Jun 21 '22 06:06 dyedgreen

There are also other ways to make INSERTs faster, read: https://stackoverflow.com/questions/1711631/improve-insert-per-second-performance-of-sqlite

E.g. if you do:

  db.transaction(() => {
    for (let i = 0; i < n; i++) {
      insert(db, `${i}`);
    }
  });

Then [0.0059 ms per query]:

start :memory: w/ 10000 insertions
total 67 ms, 0.0067 ms per query
start test.db w/ 10000 insertions
total 59 ms, 0.0059 ms per query

Write transactions in SQLite are slow as you've noticed, because it asks OS to sync the file changes to disk. Disabling that with db.query("PRAGMA synchronous = OFF"); is also an option. This yields (no transaction used) [0.1398 ms per query]:

start :memory: w/ 10000 insertions
total 93 ms, 0.0093 ms per query
start test.db w/ 10000 insertions
total 1398 ms, 0.1398 ms per query

For reference this is perf on my machine with no changes (it's NVMe SSD, linux) [1.3012 ms per query]:

start :memory: w/ 10000 insertions
total 98 ms, 0.0098 ms per query
start test.db w/ 10000 insertions
total 13012 ms, 1.3012 ms per query

nsf avatar Aug 10 '22 18:08 nsf

I feel like this issue can be closed. It seems that most if not all of the slowdowns observed here come down to how SQLite operates, not any problems with this Deno integration.

FeldrinH avatar Aug 10 '22 19:08 FeldrinH

I think I agree with @FeldrinH, there is not much we can do to make os sync / Denos IO faster, and solutions exist eg using transactions / turning of sync if you are happy with that trade off.

dyedgreen avatar Aug 10 '22 20:08 dyedgreen