Migrate to using duckdb/duckdb-async npm package instead of duckdb-wasm
As noted in #5332, duckdb-wasm does not support reading from compressed CSV/TSV files for some reason, and additionally multithreaded execution is not well supported in the WASM build (background discussed in https://duckdb.org/2021/10/29/duckdb-wasm.html), which means that performance in Positron will be hampered.
We could address both of these issues, multithreading and reading compressed files (without workarounds like decompressing a gzip file in-memory in NodeJS and then feeding it into DuckDB) by switching to duckdb-async, which is a TypeScript async wrapper for the duckdb NodeJS bindings. This is what https://github.com/antonycourtney/tad uses, for example. This is the ultimate direction that we need to go.
The downside of the duckdb NodeJS bindings is that the DuckDB C++ library gets built from source during npm install and statically linked into the NodeJS extension. I tinkered with this locally and while it's possible to get the DuckDB build running in parallel on Linux/macOS, I couldn't figure out how to get it building in parallel on Windows. A single-threaded build on DuckDB takes well over 10 minutes so this would not be ideal, so I'm not going to pursue this for the time being until we can resolve the build performance issue.
Perhaps adopting the new Node.js API (New) would be ideal as it would eliminate the build issue?
Main Differences from duckdb-node
- Native support for Promises; no need for separate duckdb-async wrapper.
- DuckDB-specific API; not based on the SQLite Node API.
- Lossless & efficent support for values of all DuckDB data types.
- Wraps released DuckDB binaries instead of rebuilding DuckDB.
- Built on DuckDB's C API; exposes more functionality.
Indeed we created new node bindings that use pre-built libraries. Should greatly reduce build time. I would warmly recommend checking it out.
Thanks — after talking with you about this a few weeks ago, I plan to do the migration but we will have to wait until Positron moves to arch-specific macOS installers (currently there is just a universal binary: https://positron.posit.co/download.html). I'm not sure what the timeline on that will be but I'll update the thread here whenever it's time!
We could build a universal binary version for OSX of the new node buildings if it helps?
Do you imagine opting in to getting the universal binary with an environment variable or similar? I think it's not a bad idea to have as an option since many people are still creating universal app distributions for macOS (even though Intel seems to be going the way of PowerPC soon enough)
Yes, let me check with the maintainers. CC @jraymakers
I'll have to research whether it's possible to build an OS X universal binary using the Node addon build tool (node-gyp).
However, it may be possible to solve this in a different (perhaps simpler) way. If you bundle both @duckdb/node-bindings-darwin-x64 and @duckdb/node-bindings-darwin-arm64 in your Mac OS X (universal) package, by adding direct dependencies to both of these to your package.json (rather than relying on the optional dependencies from @duckdb/node-bindings), then the appropriate one for the user's platform should get loaded.