Data Explorer: Create preliminary positron-duckdb extension using duckdb-wasm to provide "headless" data explorer backend
For epic #2187, addresses #4963.
This provides a new built-in positron-duckdb extension that loads duckdb-wasm in a web worker and provides an RPC endpoint using VSCode's command service for fulfilling Data Explorer requests. Only getting schemas, data values, and null count summary statistics are supported right now. So follow on work includes:
- Numeric formatting and string truncation (respecting the passed FormatOptions)
- Row filtering
- Sorting
- Detailed summary statistics
- Histograms and frequency tables for sparklines
There are some rough edges, for example if you click on a file before the extension is fully loaded at application startup, it will fail, so I will need to consult others on how to fix that.
Lastly, I have checked in some small (~10K total) data files to use in the extension tests (yarn test-extension -l positron-duckdb) and added exclusions to hygiene.js so that pre-commit checks do not complain about them. I'm not sure if there is a better way to handle this.
Other notes:
- Added code to comms/generate-comms.ts to generate interfaces containing all the parameters for each RPC, same as there already is for Rust and Python, which was needed to provide a fully formed command protocol to communicate with the extension. We can potentially look at further improving the TypeScript code generation.
- I copied the interface stubs needed into an interfaces.ts file in the extension. Maybe it's possible to cross-import from the main codebase into the extension but I do not know the right incantation of tsconfig.json/package.json configurations to do this.
In action
https://github.com/user-attachments/assets/70dabb96-6330-49e4-8db1-10293c331051
QA Notes
You can click on .parquet, .csv, or .tsv files in the file explorer after Positron has loaded to open the data explorer.
I'm not planning to do any more work in this branch, and will work on additional features in a branch based on this until this gets merged.
One thing I could use some help on is how to determine when the built-in positron-duckdb extension has been loaded (if you click on a file too fast when the application is initializing, it will create a broken data explorer). I could add some sleep/retry logic but maybe there is a cleaner way to wait for built-in extensions to be loaded.
One thing I could use some help on is how to determine when the built-in positron-duckdb extension has been loaded
This is a surprisingly hard problem in the VS Code system that I also ran into when trying to resolve all the asynchronous behavior around runtime startup. If your extension activates eagerly, you can use whenAllExtensionHostsStarted (which I added to solve a related problem); if not then the easiest way through is to have your extension invoke a command in its activate() method.
If your extension activates eagerly, you can use whenAllExtensionHostsStarted
It does activate eagerly, so I'll use that! How do you tell what positron-* extensions activate eagerly and which ones not (mine does by chance from copying lines of code from other Positron built-in extensions, not knowingly on my part)?
I added
await this._extensionService.whenAllExtensionHostsStarted();
to the main _execRpc method and it seems to hang / never resolve, both before and after the application loading phase completes. So maybe we'll have to figure that out in a follow up PR.
I'll try to make a release build and check that everything still works there.
I tried making a release build and it has a webpack error:
[16:27:00] Bundled extension: positron-duckdb/extension.webpack.config.js...
[16:27:00] 'vscode' errored after 38 min
[16:27:00] Error: ModuleDependencyWarning: Critical dependency: the request of a dependency is an expression
ModuleDependencyWarning: Critical dependency: the request of a dependency is an expression
ModuleDependencyWarning: Critical dependency: the request of a dependency is an expression
ModuleDependencyWarning: Critical dependency: the request of a dependency is an expression
at formatError (/home/wesm/code/positron/node_modules/gulp-cli/lib/versioned/^4.0.0/format-error.js:21:10)
at Gulp.<anonymous> (/home/wesm/code/positron/node_modules/gulp-cli/lib/versioned/^4.0.0/log/events.js:33:15)
at Gulp.emit (node:events:531:35)
at Gulp.emit (node:domain:488:12)
at Object.error (/home/wesm/code/positron/node_modules/undertaker/lib/helpers/createExtensions.js:61:10)
at handler (/home/wesm/code/positron/node_modules/now-and-later/lib/map.js:50:14)
at f (/home/wesm/code/positron/node_modules/once/once.js:25:25)
at f (/home/wesm/code/positron/node_modules/once/once.js:25:25)
at tryCatch (/home/wesm/code/positron/node_modules/async-done/index.js:24:15)
at done (/home/wesm/code/positron/node_modules/async-done/index.js:40:12)
error Command failed with exit code 1.
info Visit https://yarnpkg.com/en/docs/cli/run for documentation about this command.
The critical code path here seems to be what webpack does not like because of dynamic resolution:
const modPath = require.resolve('@duckdb/duckdb-wasm');
const dist_path = dirname(modPath);
const MANUAL_BUNDLES = {
mvp: {
mainModule: resolve(dist_path, './duckdb-mvp.wasm'),
mainWorker: resolve(dist_path, './duckdb-node-mvp.worker.cjs')
},
eh: {
mainModule: resolve(dist_path, './duckdb-eh.wasm'),
mainWorker: resolve(dist_path, './duckdb-node-eh.worker.cjs')
}
};
const bundle = await duckdb.selectBundle(MANUAL_BUNDLES);
The duckdb-wasm package has a section about use with webpack, but I tinkered with this and wasn't able to get it working and don't really know what I'm doing, so I'm going to need some help from others @petetronic @jmcphers @seeM
https://duckdb.org/docs/api/wasm/instantiation.html#webpack
Here's what ChatGPT has to say on the matter if it is not hallucinating:
https://gist.github.com/wesm/f6e227b72653167dbc966031e7933782
It seems like we will have to do some work to get the wasm bundles loading both in a development context and a webpack context, maybe similar to the tree-sitter-wasm stuff. Let me know if there is someone who is available to help me with this, and I'll just work on follow-on data explorer features using this in a separate branch
I've spent half my weekend on trying to get the webpack build to work and I'm completely stumped.
I have an error like:
$ yarn gulp compile-extensions-build
<SNIP>
[14:47:46] Bundled extension: positron-duckdb/extension.webpack.config.js...
[14:47:47] 'compile-extensions-build' errored after 1.25 min
[14:47:47] Error: ModuleDependencyWarning: Critical dependency: the request of a dependency is an expression
ModuleDependencyWarning: Critical dependency: the request of a dependency is an expression
ModuleDependencyWarning: Critical dependency: the request of a dependency is an expression
ModuleDependencyWarning: Critical dependency: the request of a dependency is an expression
at formatError (/home/wesm/code/positron/node_modules/gulp-cli/lib/versioned/^4.0.0/format-error.js:21:10)
at Gulp.<anonymous> (/home/wesm/code/positron/node_modules/gulp-cli/lib/versioned/^4.0.0/log/events.js:33:15)
at Gulp.emit (node:events:531:35)
at Gulp.emit (node:domain:488:12)
at Object.error (/home/wesm/code/positron/node_modules/undertaker/lib/helpers/createExtensions.js:61:10)
at handler (/home/wesm/code/positron/node_modules/now-and-later/lib/map.js:50:14)
at f (/home/wesm/code/positron/node_modules/once/once.js:25:25)
at f (/home/wesm/code/positron/node_modules/once/once.js:25:25)
at tryCatch (/home/wesm/code/positron/node_modules/async-done/index.js:24:15)
at done (/home/wesm/code/positron/node_modules/async-done/index.js:40:12)
error Command failed with exit code 1.
info Visit https://yarnpkg.com/en/docs/cli/run for documentation about this command.
I fiddled with the webpack config to try to isolate the NodeJS DuckDB wasm configuration (e.g. using webpack.IgnorePlugin), but it appears to be trying to analyze and bundle the getDuckDBNodeBundles function. So I'm going to stop here and send out an SOS for someone else to help figure this out
If it helps others, I found this webpack-based web app on the duckdb-wasm repository which may help
https://github.com/duckdb/duckdb-wasm/tree/main/packages/duckdb-wasm-app
I'm going to stop spending more time on this before I tear all my hair out =)
I don't believe I'm going to be able to get this working on my own, so I am going to stop fiddling with it and making more of a mess
I tested out release builds locally on Linux and macOS, so going ahead to merge this, thanks @jmcphers for the save on the webpack issues!