wa-sqlite icon indicating copy to clipboard operation
wa-sqlite copied to clipboard

`MemoryVFS` should avoid creating `Proxy` objects in `makeDataArray`

Open grantcox opened this issue 8 months ago • 3 comments

This is a performance improvement, for the MemoryVFS (and MemoryAsyncVFS).

Currently the FacadeVFS protects against WASM memory resizing by wrapping the data arrays passed to jRead and jWrite with a Proxy object. This instance creation for every IO operation is quite heavy, and for the MemoryVFS is unnecessary as the data arrays are consumed immediately.

I've avoided this by overriding the makeDataArray to always point to the underlying WASM memory. And it's much faster - 40x faster for 1000 inserts!

makeDataArray-no-proxy

Checklist

  • [x] I grant to recipients of this Project distribution a perpetual, non-exclusive, royalty-free, irrevocable copyright license to reproduce, prepare derivative works of, publicly display, sublicense, and distribute this Contribution and such derivative works.
  • [x] I certify that I am legally entitled to grant this license, and that this Contribution contains no content requiring a license from any third party.

grantcox avatar May 13 '25 03:05 grantcox

Thanks for the PR! So 40x faster is certainly quite a bit faster. But are you really using MemoryVFS for anything real where you need that performance? The special ":memory:" database should be faster still.

My concern is that MemoryVFS is mainly provided as an example and a possible starting point to implement a custom VFS. Adding another method makes it a bit longer and and more complicated, and perhaps eventually a custom VFS evolved from it would become vulnerable to a bug on WebAssembly memory resize.

I'm thinking that I'm fine with exposing a makeDataArray() method to override, but I'm not so sure about actually overriding it in MemoryVFS.

rhashimoto avatar May 15 '25 18:05 rhashimoto

But are you really using MemoryVFS for anything real where you need that performance?

Ha, that's a great question! Our project involves an existing shared-business-logic module that is used on web and mobile. It already supports persistence on mobile via SQLite, but only through a fully synchronous interface. On web it uses a completely separate set of data patterns using in-memory collections. It'd be a great simplification to use the same SQLite integration on web, but it's simply not feasible to make it all async. So we're experimenting with a fully-synchronous SQLite WASM build.

However, as we'd really like to have persistence, and encryption on anything persisted, we would like to use a VFS. Enter wa-sqlite. Your examples of all the various VFS's was very helpful for us to feel like there could be a solution in this space. The proof-of-concept that we've developed (a total frankenstein, but you're welcome to view here) basically uses the MemoryVFS, and regularly syncs the DB state to a worker to be persisted. As our web client currently handles having no local state (it pulls it all from the server), and the mobile SQLite integration already supports syncing, it means the local state being a little out of date is totally fine.

Anyway, all that to say that yes, actually using the MemoryVFS in production, and it being as-fast-as-possible, does feel like the best approach.

From a "but does it matter to wa-sqlite" perspective, I would suggest it does. A naive assumption is that the MemoryVFS performance, and the delta compared to :memory:, indicates the cost of "doing VFS work in Javascript". I initially assumed the poorer performance was due to data crossing a WASM / JS memory boundary, which is not at all the case.

Actually perhaps a better solution for wa-sqlite would be to replace the Proxy with a lambda. It would require changing the interface to jRead and jWrite though, but for example if FacadeVFS had this:

  makeDataArray(byteOffset, byteLength) {
    let target = this._module.HEAPU8.subarray(byteOffset, byteOffset + byteLength);
    return () => {
      if (target.buffer.byteLength === 0) {
        // WebAssembly memory resize detached the buffer.
        target = this._module.HEAPU8.subarray(byteOffset, byteOffset + byteLength);
      }
      return target;
    }
  }

then all VFS's would be safe from WASM memory resize, but they'd also gain almost the full performance benefit as in this current PR. Here's some numbers from our performance tests:

:memory: MemoryVFS MemoryVFS w/ overloaded makeDataArray no Proxy MemoryVFS w/ FacadeVFS.makeDataArray : () => pData
db-dump.sql (85.6 MB) 2308.8 ms
2312.5 ms
2317.8 ms
11329.6 ms
11360.4 ms
11411.5 ms
2371.8 ms
2386.3 ms
2391.3 ms
2385.9 ms
2392.3 ms
2388.1 ms
real-world-queries.json (226.5 MB) 4672.6 ms
4669.7 ms
4671.2 ms
13700.0 ms
13774.7 ms
13700.7 ms
4900.5 ms
4984.9 ms
4912.7 ms
4971.1 ms
4972.2 ms
4947.8 ms

The db-dump.sql here is from sqlite3 .dump so is a very optimal insert-only file. The real-world-queries.json is 250K queries, with parameter binding, a mixture of reads and writes, recorded from our real application.

The special ":memory:" database should be faster still.

I've included this in the benchmarks above as a comparison - it is faster, but by a surprisingly small amount.

grantcox avatar May 16 '25 00:05 grantcox

Actually perhaps a better solution for wa-sqlite would be to replace the Proxy with a lambda. It would require changing the interface to jRead and jWrite...

I'm not going to change the the function signatures for jRead and jWrite anytime soon. However, an alternative could be to pass a class with the same interface as Uint8Array but implemented with a regular class instead of Proxy.

rhashimoto avatar May 17 '25 15:05 rhashimoto

However, an alternative could be to pass a class with the same interface as Uint8Array but implemented with a regular class instead of Proxy.

#285

rhashimoto avatar Jun 19 '25 17:06 rhashimoto