positron icon indicating copy to clipboard operation
positron copied to clipboard

Migrate to using duckdb/duckdb-async npm package instead of duckdb-wasm

Open wesm opened this issue 1 year ago • 2 comments

As noted in #5332, duckdb-wasm does not support reading from compressed CSV/TSV files for some reason, and additionally multithreaded execution is not well supported in the WASM build (background discussed in https://duckdb.org/2021/10/29/duckdb-wasm.html), which means that performance in Positron will be hampered.

We could address both of these issues, multithreading and reading compressed files (without workarounds like decompressing a gzip file in-memory in NodeJS and then feeding it into DuckDB) by switching to duckdb-async, which is a TypeScript async wrapper for the duckdb NodeJS bindings. This is what https://github.com/antonycourtney/tad uses, for example. This is the ultimate direction that we need to go.

The downside of the duckdb NodeJS bindings is that the DuckDB C++ library gets built from source during npm install and statically linked into the NodeJS extension. I tinkered with this locally and while it's possible to get the DuckDB build running in parallel on Linux/macOS, I couldn't figure out how to get it building in parallel on Windows. A single-threaded build on DuckDB takes well over 10 minutes so this would not be ideal, so I'm not going to pursue this for the time being until we can resolve the build performance issue.

wesm avatar Jan 06 '25 21:01 wesm

Perhaps adopting the new Node.js API (New) would be ideal as it would eliminate the build issue?

Main Differences from duckdb-node

https://duckdb.org/docs/clients/node_neo/overview.html

eitsupi avatar Feb 13 '25 12:02 eitsupi

Indeed we created new node bindings that use pre-built libraries. Should greatly reduce build time. I would warmly recommend checking it out.

hannes avatar May 28 '25 09:05 hannes

Thanks — after talking with you about this a few weeks ago, I plan to do the migration but we will have to wait until Positron moves to arch-specific macOS installers (currently there is just a universal binary: https://positron.posit.co/download.html). I'm not sure what the timeline on that will be but I'll update the thread here whenever it's time!

wesm avatar May 28 '25 17:05 wesm

We could build a universal binary version for OSX of the new node buildings if it helps?

hannes avatar Jun 02 '25 12:06 hannes

Do you imagine opting in to getting the universal binary with an environment variable or similar? I think it's not a bad idea to have as an option since many people are still creating universal app distributions for macOS (even though Intel seems to be going the way of PowerPC soon enough)

wesm avatar Jun 02 '25 21:06 wesm

Yes, let me check with the maintainers. CC @jraymakers

hannes avatar Jun 04 '25 11:06 hannes

I'll have to research whether it's possible to build an OS X universal binary using the Node addon build tool (node-gyp).

However, it may be possible to solve this in a different (perhaps simpler) way. If you bundle both @duckdb/node-bindings-darwin-x64 and @duckdb/node-bindings-darwin-arm64 in your Mac OS X (universal) package, by adding direct dependencies to both of these to your package.json (rather than relying on the optional dependencies from @duckdb/node-bindings), then the appropriate one for the user's platform should get loaded.

jraymakers avatar Jun 04 '25 16:06 jraymakers