mosaic icon indicating copy to clipboard operation
mosaic copied to clipboard

Issue Querying MotherDuck After v0.12.2 Upgrade

Open enlore opened this issue 11 months ago • 10 comments

Summary

Dependencies

Dependency versions:

  "dependencies": {
    "@motherduck/wasm-client": "^0.6.6",
    "@uwdata/vgplot": "^0.12.2",
    "apache-arrow": "^17.0.0",
    "react": "^18.3.1",
    "react-dom": "^18.3.1"
  },
➜  debug-motherduck-mosaic-v_0_12_2 git:(main)  node --version
v21.0.0

Errors

When attempting to build plots using data fetched from MotherDuck, one of two errors occurs.

ERROR 1: getFieldInfo throws:

field-info.js:75 Uncaught (in promise) TypeError: Cannot read properties of undefined (reading 'column_type')
   at getFieldInfo (field-info.js:75:19)
   at async Promise.all (:5173/index 1)
   at async queryFieldInfo (field-info.js:50:13)
   at async Coordinator.initializeClient (Coordinator.js:251:24)

This happens when attempting to build a plot by querying data from MotherDuck.

ERROR 2: arrowToColumns throws:

hook.js:608 TypeError: data.toColumns is not a function
  at arrowToColumns (to-data-columns.js:35:35)
  at toDataColumns (to-data-columns.js:24:7)
  at ConnectedMark.queryResult (Mark.js:177:17)
  at Coordinator.js:207:24

While exploring this issue, I validated that I can execute DESCRIBE queries against the plot's source table directly using vg.coordinator().query().

If I execute a DESCRIBE query with the same exact quotes as generated by Mosaic's query building logic before attempting to create the plot, this error is thrown instead of ERROR 1. See src/App.tsx line 28 for sample code.

Repro

See reference repo here: https://github.com/enlore/debug-motherduck-mosaic-v_0_12_2

It should be good to go so you can clone, npm i, and npm run dev.

The repo includes a read only token so the app can connect to my personal MotherDuck free tier account for convenience.

  • src/App.tsx contains the relevant logic
  • Set CONNECT_TO_MOTHERDUCK to true to see ERROR 1
  • Uncomment line 45 to see ERROR 2
  • Set CONNECT_TO_MOTHERDUCK to false to validate that Mosaic can build a line plot from a browser local DuckDB Wasm instance without issue
  • DuckDB version number is logged to the console for both code paths

Let me know if I can provide additional information.

Thank you.

enlore avatar Jan 31 '25 19:01 enlore

I suspect the issue is that mosaic now uses flechette instead of arrow js. Can motherduck just return the binary blob instead of an arrow table object?

domoritz avatar Jan 31 '25 23:01 domoritz

I will find out.

What would be the idea from that point? Hand the binary blob to Flechette to turn into a Table?

enlore avatar Feb 03 '25 14:02 enlore

Yeah, something like https://github.com/uwdata/mosaic/blob/fba8d950eebaca638fb4bc1f6a771a78ad304d1a/packages/core/src/connectors/rest.js#L32.

domoritz avatar Feb 03 '25 15:02 domoritz

Note that the decodeIPC method (which parses IPC bytes to a Flechette table) is exported for reuse: https://github.com/uwdata/mosaic/blob/fba8d950eebaca638fb4bc1f6a771a78ad304d1a/packages/core/src/index.js#L19

Personally, I would prefer if data tools defaulted to proving Arrow bytes rather than a specific Arrow implementation, especially as the default (apache-arrow package) implementation does not (last time I checked) support all of the data types returned in DuckDB arrow results.

jheer avatar Feb 03 '25 16:02 jheer

Chatted with the MotherDuck team, they suggested an approach for getting at the IPC data.

I implemented their suggestion into my sample project. See summary below and https://github.com/enlore/debug-motherduck-mosaic-v_0_12_2/blob/main/src/App.tsx for details.

Good news: seems to work as far as ensuring we are creating Flechette tables from query result byte streams.

Bad news: Mosaic is still choking on getFieldInfo.

Below are the key elements of getting at the IPC data directly.

  // We need connection ID to poll against later and the MDConnection does not 
  //   provide a useUnsafe method
  //
  // const connection = MDConnection.create({
  //   mdToken: token,
  // });

  // So we drop down a level to the DuckDB instance provided by MotherDuck's 
  //   getAsyncDuckDb and get a connection from it
  const duckDb = await getAsyncDuckDb({
    mdToken: token,
  })

  // Cast to the DuckDB Wasm version of the AsyncDuckDBConnection type
  const duckDbConn = await duckDb.connect() as unknown as AsyncDuckDBConnection
/**
 * Start a query and poll it to see when it finishes. Get the IPC data and build a Uint8Array from it.
 * @param conn AsyncDuckDBConnection
 * @param sql string A DuckDB SQL query string
 * @returns Promise<Uint8Array | undefined> The query results IPC data stream all buffered into one place
 */
async function getMDArrowIPC(conn: AsyncDuckDBConnection, sql: string): Promise<Uint8Array | undefined> {
  return await conn.useUnsafe(async (bindings: AsyncDuckDB, connId: number) => {
    try {
      let header = await bindings.startPendingQuery(connId, sql);
      
      while (header == null) {
          header = await bindings.pollPendingQuery(connId);
      }

      // see the duckdb-wasm AsyncResultStreamIterator: https://github.com/duckdb/duckdb-wasm/blob/41c03bef54cbc310ba2afa2e5f71f7a22b38364f/packages/duckdb-wasm/src/parallel/async_connection.ts#L111-L158 
      const iter = new AsyncResultStreamIterator(bindings, connId, header);
      
      const buffer = await accumulateIPCBuffer(iter);
     
      return buffer;

    } catch (error) {
        // TODO blearg
        console.error(error)
    }
  });
}

Finally we

function decodeIPCToFlechette(result: Uint8Array): Table {
  const table = decodeIPC(result)
  return table
}

enlore avatar Feb 04 '25 17:02 enlore

In the below screenshot, an error is thrown because desc is undefined.

Image

Errors:

Image

Possibly related behavior: if I run a DESCRIBE query using the exact same quotes as the query generated by Mosaic before I build the plot, error goes away. Chart still seems busted though.

Code the runs the describe query:

const d2 = await vg.coordinator().query('DESCRIBE SELECT "tpep_pickup_datetime" AS "column" FROM "sample_data"."nyc"."taxi" AS "source"') as Table
console.info('d2', d2.toArray())

Screenshot:

Image

Further, if I run a DESCRIBE query prior to building the plot without worrying about the quotes: error.

Code to run the describe query:

// Confirming that we can DESCRIBE successfully
const d1 = await vg.coordinator().query('DESCRIBE SELECT tpep_pickup_datetime AS column FROM sample_data.nyc.taxi AS source') as Table
console.info('d1', d1.toArray())

Screenshot of error, no plot:

Image

enlore avatar Feb 04 '25 17:02 enlore

Possibly related?

https://github.com/duckdb/duckdb/issues/16017

enlore avatar Feb 04 '25 18:02 enlore

Possibly related?

duckdb/duckdb#16017

Hmm, that definitely looks like it could break Mosaic in some cases. Thanks for flagging!

jheer avatar Feb 04 '25 18:02 jheer

@jheer Has there been any update on this? As I too am seeing the issues described above.

AndyM10 avatar Apr 27 '25 22:04 AndyM10

@jheer Has there been any update on this? As I too am seeing the issues described above.

Hi @AndyM10. Based on the discussion in https://github.com/duckdb/duckdb/issues/16017, it looks like a fix for the DESCRIBE issue was added to DuckDB. I just verified that this is working in the WASM connector using the standard DuckDB-WASM release. I don't know if it has been taken up by MotherDuck yet.

jheer avatar May 07 '25 18:05 jheer