Issue Querying MotherDuck After v0.12.2 Upgrade
Summary
Dependencies
Dependency versions:
"dependencies": {
"@motherduck/wasm-client": "^0.6.6",
"@uwdata/vgplot": "^0.12.2",
"apache-arrow": "^17.0.0",
"react": "^18.3.1",
"react-dom": "^18.3.1"
},
➜ debug-motherduck-mosaic-v_0_12_2 git:(main) node --version
v21.0.0
Errors
When attempting to build plots using data fetched from MotherDuck, one of two errors occurs.
ERROR 1: getFieldInfo throws:
field-info.js:75 Uncaught (in promise) TypeError: Cannot read properties of undefined (reading 'column_type')
at getFieldInfo (field-info.js:75:19)
at async Promise.all (:5173/index 1)
at async queryFieldInfo (field-info.js:50:13)
at async Coordinator.initializeClient (Coordinator.js:251:24)
This happens when attempting to build a plot by querying data from MotherDuck.
ERROR 2: arrowToColumns throws:
hook.js:608 TypeError: data.toColumns is not a function
at arrowToColumns (to-data-columns.js:35:35)
at toDataColumns (to-data-columns.js:24:7)
at ConnectedMark.queryResult (Mark.js:177:17)
at Coordinator.js:207:24
While exploring this issue, I validated that I can execute DESCRIBE queries against the plot's source table directly using vg.coordinator().query().
If I execute a DESCRIBE query with the same exact quotes as generated by Mosaic's query building logic before attempting to create the plot, this error is thrown instead of ERROR 1. See src/App.tsx line 28 for sample code.
Repro
See reference repo here: https://github.com/enlore/debug-motherduck-mosaic-v_0_12_2
It should be good to go so you can clone, npm i, and npm run dev.
The repo includes a read only token so the app can connect to my personal MotherDuck free tier account for convenience.
-
src/App.tsxcontains the relevant logic - Set
CONNECT_TO_MOTHERDUCKtotrueto see ERROR 1 - Uncomment line 45 to see ERROR 2
- Set
CONNECT_TO_MOTHERDUCKtofalseto validate that Mosaic can build a line plot from a browser local DuckDB Wasm instance without issue - DuckDB version number is logged to the console for both code paths
Let me know if I can provide additional information.
Thank you.
I suspect the issue is that mosaic now uses flechette instead of arrow js. Can motherduck just return the binary blob instead of an arrow table object?
I will find out.
What would be the idea from that point? Hand the binary blob to Flechette to turn into a Table?
Yeah, something like https://github.com/uwdata/mosaic/blob/fba8d950eebaca638fb4bc1f6a771a78ad304d1a/packages/core/src/connectors/rest.js#L32.
Note that the decodeIPC method (which parses IPC bytes to a Flechette table) is exported for reuse:
https://github.com/uwdata/mosaic/blob/fba8d950eebaca638fb4bc1f6a771a78ad304d1a/packages/core/src/index.js#L19
Personally, I would prefer if data tools defaulted to proving Arrow bytes rather than a specific Arrow implementation, especially as the default (apache-arrow package) implementation does not (last time I checked) support all of the data types returned in DuckDB arrow results.
Chatted with the MotherDuck team, they suggested an approach for getting at the IPC data.
I implemented their suggestion into my sample project. See summary below and https://github.com/enlore/debug-motherduck-mosaic-v_0_12_2/blob/main/src/App.tsx for details.
Good news: seems to work as far as ensuring we are creating Flechette tables from query result byte streams.
Bad news: Mosaic is still choking on getFieldInfo.
Below are the key elements of getting at the IPC data directly.
// We need connection ID to poll against later and the MDConnection does not
// provide a useUnsafe method
//
// const connection = MDConnection.create({
// mdToken: token,
// });
// So we drop down a level to the DuckDB instance provided by MotherDuck's
// getAsyncDuckDb and get a connection from it
const duckDb = await getAsyncDuckDb({
mdToken: token,
})
// Cast to the DuckDB Wasm version of the AsyncDuckDBConnection type
const duckDbConn = await duckDb.connect() as unknown as AsyncDuckDBConnection
/**
* Start a query and poll it to see when it finishes. Get the IPC data and build a Uint8Array from it.
* @param conn AsyncDuckDBConnection
* @param sql string A DuckDB SQL query string
* @returns Promise<Uint8Array | undefined> The query results IPC data stream all buffered into one place
*/
async function getMDArrowIPC(conn: AsyncDuckDBConnection, sql: string): Promise<Uint8Array | undefined> {
return await conn.useUnsafe(async (bindings: AsyncDuckDB, connId: number) => {
try {
let header = await bindings.startPendingQuery(connId, sql);
while (header == null) {
header = await bindings.pollPendingQuery(connId);
}
// see the duckdb-wasm AsyncResultStreamIterator: https://github.com/duckdb/duckdb-wasm/blob/41c03bef54cbc310ba2afa2e5f71f7a22b38364f/packages/duckdb-wasm/src/parallel/async_connection.ts#L111-L158
const iter = new AsyncResultStreamIterator(bindings, connId, header);
const buffer = await accumulateIPCBuffer(iter);
return buffer;
} catch (error) {
// TODO blearg
console.error(error)
}
});
}
Finally we
function decodeIPCToFlechette(result: Uint8Array): Table {
const table = decodeIPC(result)
return table
}
In the below screenshot, an error is thrown because desc is undefined.
Errors:
Possibly related behavior: if I run a DESCRIBE query using the exact same quotes as the query generated by Mosaic before I build the plot, error goes away. Chart still seems busted though.
Code the runs the describe query:
const d2 = await vg.coordinator().query('DESCRIBE SELECT "tpep_pickup_datetime" AS "column" FROM "sample_data"."nyc"."taxi" AS "source"') as Table
console.info('d2', d2.toArray())
Screenshot:
Further, if I run a DESCRIBE query prior to building the plot without worrying about the quotes: error.
Code to run the describe query:
// Confirming that we can DESCRIBE successfully
const d1 = await vg.coordinator().query('DESCRIBE SELECT tpep_pickup_datetime AS column FROM sample_data.nyc.taxi AS source') as Table
console.info('d1', d1.toArray())
Screenshot of error, no plot:
Possibly related?
https://github.com/duckdb/duckdb/issues/16017
Possibly related?
Hmm, that definitely looks like it could break Mosaic in some cases. Thanks for flagging!
@jheer Has there been any update on this? As I too am seeing the issues described above.
@jheer Has there been any update on this? As I too am seeing the issues described above.
Hi @AndyM10. Based on the discussion in https://github.com/duckdb/duckdb/issues/16017, it looks like a fix for the DESCRIBE issue was added to DuckDB. I just verified that this is working in the WASM connector using the standard DuckDB-WASM release. I don't know if it has been taken up by MotherDuck yet.