superduper
superduper copied to clipboard
Support for MotherDuck
Albeit the sql-exapmple.ipynb
is working for duckdb
, it is failing for motherduck
.
To connect to Motherduck:
# ---------------------------------
# MotherDuck (Cloud)
# ---------------------------------
!pip install "ibis-framework[duckdb]"
token = "<add your token>"
connection_uri = f"duckdb://md:{token}@test_db"
To get the token you must first create an account https://motherduck.com/docs/authenticating-to-motherduck/
The add data to datastore
cell fails with:
ProgrammingError: (duckdb.duckdb.InvalidInputException) Invalid Input Error: Attempting to execute an unsuccessful or closed pending query result
Error: Invalid Error: RPC 'SETUP_PLAN_FRAGMENTS' failed: Deadline Exceeded (DEADLINE_EXCEEDED, request id: '05515ef3-a7fe-4c9b-afc9-a57289617d16')
[SQL: INSERT INTO images (id, image, _fold) VALUES (?, ?, ?),
....
(Background on this error at: https://sqlalche.me/e/14/f405)
Here is the traceback:
---------------------------------------------------------------------------
InvalidInputException Traceback (most recent call last)
File ~/.pyenv/versions/superduper/lib/python3.11/site-packages/sqlalchemy/engine/base.py:1910, in Connection._execute_context(self, dialect, constructor, statement, parameters, execution_options, *args, **kw)
1909 if not evt_handled:
-> 1910 self.dialect.do_execute(
1911 cursor, statement, parameters, context
1912 )
1914 if self._has_events or self.engine._has_events:
File ~/.pyenv/versions/superduper/lib/python3.11/site-packages/sqlalchemy/engine/default.py:736, in DefaultDialect.do_execute(self, cursor, statement, parameters, context)
735 def do_execute(self, cursor, statement, parameters, context=None):
--> 736 cursor.execute(statement, parameters)
File ~/.pyenv/versions/superduper/lib/python3.11/site-packages/duckdb_engine/__init__.py:160, in ConnectionWrapper.execute(self, statement, parameters, context)
159 else:
--> 160 self.__c.execute(statement, parameters)
161 except RuntimeError as e:
InvalidInputException: Invalid Input Error: Attempting to execute an unsuccessful or closed pending query result
Error: Invalid Error: RPC 'SETUP_PLAN_FRAGMENTS' failed: Deadline Exceeded (DEADLINE_EXCEEDED, request id: '05515ef3-a7fe-4c9b-afc9-a57289617d16')
The above exception was the direct cause of the following exception:
ProgrammingError Traceback (most recent call last)
Cell In[48], line 2
1 # Insert data from the 'images_df' DataFrame into the 'images' table
----> 2 _ = db.execute(images.insert(images_df))
4 # Insert data from the 'captions_df' DataFrame into the 'captions' table
5 _ = db.execute(captions.insert(captions_df))
File ~/.pyenv/versions/superduper/lib/python3.11/site-packages/superduperdb/base/datalayer.py:386, in Datalayer.execute(self, query, *args, **kwargs)
384 return self.delete(query, *args, **kwargs)
385 if isinstance(query, Insert):
--> 386 return self.insert(query, *args, **kwargs)
387 if isinstance(query, Select):
388 return self.select(query, *args, **kwargs)
File ~/.pyenv/versions/superduper/lib/python3.11/site-packages/superduperdb/base/datalayer.py:427, in Datalayer.insert(self, insert, refresh, encoders)
425 if random.random() < s.CFG.fold_probability:
426 r['_fold'] = 'valid' # type: ignore[assignment]
--> 427 inserted_ids = insert.execute(self)
429 if refresh and self.cdc.running:
430 raise Exception('cdc cannot be activated and refresh=True')
File ~/.pyenv/versions/superduper/lib/python3.11/site-packages/superduperdb/backends/ibis/query.py:846, in IbisInsert.execute(self, db)
843 encoded_documents = self._encode_documents(table=table)
844 ids = [r[table.primary_id] for r in encoded_documents]
--> 846 db.databackend.insert(
847 self.table_or_collection.identifier, raw_documents=encoded_documents
848 )
849 return ids
File ~/.pyenv/versions/superduper/lib/python3.11/site-packages/superduperdb/backends/ibis/data_backend.py:36, in IbisDataBackend.insert(self, table_name, raw_documents)
34 def insert(self, table_name, raw_documents):
35 if not self.in_memory:
---> 36 self.conn.insert(table_name, raw_documents)
37 else:
38 self.conn.create_table(table_name, pandas.DataFrame(raw_documents))
File ~/.pyenv/versions/superduper/lib/python3.11/site-packages/ibis/backends/base/sql/alchemy/__init__.py:777, in BaseAlchemyBackend.insert(self, table_name, obj, database, overwrite)
775 if overwrite:
776 bind.execute(to_table.delete())
--> 777 bind.execute(to_table.insert().values(obj))
779 else:
780 raise ValueError(
781 "No operation is being performed. Either the obj parameter "
782 "is not a pandas DataFrame or is not a ibis Table."
783 f"The given obj is of type {type(obj).__name__} ."
784 )
File ~/.pyenv/versions/superduper/lib/python3.11/site-packages/sqlalchemy/engine/base.py:1385, in Connection.execute(self, statement, *multiparams, **params)
1381 util.raise_(
1382 exc.ObjectNotExecutableError(statement), replace_context=err
1383 )
1384 else:
-> 1385 return meth(self, multiparams, params, _EMPTY_EXECUTION_OPTS)
File ~/.pyenv/versions/superduper/lib/python3.11/site-packages/sqlalchemy/sql/elements.py:334, in ClauseElement._execute_on_connection(self, connection, multiparams, params, execution_options, _force)
330 def _execute_on_connection(
331 self, connection, multiparams, params, execution_options, _force=False
332 ):
333 if _force or self.supports_execution:
--> 334 return connection._execute_clauseelement(
335 self, multiparams, params, execution_options
336 )
337 else:
338 raise exc.ObjectNotExecutableError(self)
File ~/.pyenv/versions/superduper/lib/python3.11/site-packages/sqlalchemy/engine/base.py:1577, in Connection._execute_clauseelement(self, elem, multiparams, params, execution_options)
1565 compiled_cache = execution_options.get(
1566 "compiled_cache", self.engine._compiled_cache
1567 )
1569 compiled_sql, extracted_params, cache_hit = elem._compile_w_cache(
1570 dialect=dialect,
1571 compiled_cache=compiled_cache,
(...)
1575 linting=self.dialect.compiler_linting | compiler.WARN_LINTING,
1576 )
-> 1577 ret = self._execute_context(
1578 dialect,
1579 dialect.execution_ctx_cls._init_compiled,
1580 compiled_sql,
1581 distilled_params,
1582 execution_options,
1583 compiled_sql,
1584 distilled_params,
1585 elem,
1586 extracted_params,
1587 cache_hit=cache_hit,
1588 )
1589 if has_events:
1590 self.dispatch.after_execute(
1591 self,
1592 elem,
(...)
1596 ret,
1597 )
File ~/.pyenv/versions/superduper/lib/python3.11/site-packages/sqlalchemy/engine/base.py:1953, in Connection._execute_context(self, dialect, constructor, statement, parameters, execution_options, *args, **kw)
1950 branched.close()
1952 except BaseException as e:
-> 1953 self._handle_dbapi_exception(
1954 e, statement, parameters, cursor, context
1955 )
1957 return result
File ~/.pyenv/versions/superduper/lib/python3.11/site-packages/sqlalchemy/engine/base.py:2134, in Connection._handle_dbapi_exception(self, e, statement, parameters, cursor, context)
2132 util.raise_(newraise, with_traceback=exc_info[2], from_=e)
2133 elif should_wrap:
-> 2134 util.raise_(
2135 sqlalchemy_exception, with_traceback=exc_info[2], from_=e
2136 )
2137 else:
2138 util.raise_(exc_info[1], with_traceback=exc_info[2])
File ~/.pyenv/versions/superduper/lib/python3.11/site-packages/sqlalchemy/util/compat.py:211, in raise_(***failed resolving arguments***)
208 exception.__cause__ = replace_context
210 try:
--> 211 raise exception
212 finally:
213 # credit to
214 # https://cosmicpercolator.com/2016/01/13/exception-leaks-in-python-2-and-3/
215 # as the __traceback__ object creates a cycle
216 del exception, replace_context, from_, with_traceback
File ~/.pyenv/versions/superduper/lib/python3.11/site-packages/sqlalchemy/engine/base.py:1910, in Connection._execute_context(self, dialect, constructor, statement, parameters, execution_options, *args, **kw)
1908 break
1909 if not evt_handled:
-> 1910 self.dialect.do_execute(
1911 cursor, statement, parameters, context
1912 )
1914 if self._has_events or self.engine._has_events:
1915 self.dispatch.after_cursor_execute(
1916 self,
1917 cursor,
(...)
1921 context.executemany,
1922 )
File ~/.pyenv/versions/superduper/lib/python3.11/site-packages/sqlalchemy/engine/default.py:736, in DefaultDialect.do_execute(self, cursor, statement, parameters, context)
735 def do_execute(self, cursor, statement, parameters, context=None):
--> 736 cursor.execute(statement, parameters)
File ~/.pyenv/versions/superduper/lib/python3.11/site-packages/duckdb_engine/__init__.py:160, in ConnectionWrapper.execute(self, statement, parameters, context)
158 self.__c.execute(statement)
159 else:
--> 160 self.__c.execute(statement, parameters)
161 except RuntimeError as e:
162 if e.args[0].startswith("Not implemented Error"):
Following error:
ProgrammingError: (duckdb.duckdb.InvalidInputException) Invalid Input Error: Attempting to execute an unsuccessful or closed pending query result
Error: FATAL Error: Failed: database has been invalidated because of a previous fatal error. The database must be restarted prior to being used again.
Original error: "FATAL Error: You've encountered an internal MotherDuck error. You can help us diagnose and fix the issue by contacting support via slack or [email protected] and referencing error ID: 20240103a0d."
(Background on this error at: https://sqlalche.me/e/14/f405)
Here the investigations of the MotherDuck team:
We identified the issue and a possible workaround. Ibis inlines the blob values in the INSERT INTO statement which makes the logical execution plan that has to be shipped between the DuckDB client and motherduck quite large and finally leads to the timeout (4MB is the current maximum size we support, the plan in the example notebook was around 48MB). We will add support for larger plans at some point, but I want to suggest a workaroud. Instead of inlining the BLOB values in the INSERT INTO statement, you can use DuckDB's zero-copy mechnism to insert values from images_df with a statement like this: INSERT INTO images (id, image, _fold) SELECT id, image, 'test' FROM images_df (instead of the INSERT INTO images VALUES(..) that is currently generated within _ = db.execute(images.insert(images_df)) ) Since the insert is happening behind the Ibis abstraction, it might require some special-casing for motherduck, but overall I'd advise to perform inserts in such a way even if we are going to add support for larger plans in the future.