databricks-sql-python
databricks-sql-python copied to clipboard
ExecuteMany performance is insanely bad
I know that the documentation makes it clear that executemany is a naive for loop:
No optimizations of the query (like batching) will be performed.
But it's 2025 and please have a more optimized executemany that just issues a single SQL statement using VALUES(...) or something so this is actually usable in a data pipeline. Otherwise, trying to use the databricks SQL connector to write any non-trivial dataframe to a delta table is pointless.
Thanks!
For reference, the ODBC databricks driver with cursor.fast_executemany = True can do ~140 rows / s.
Still not great, but better than what this SQL connector can do