ExecuteMany performance is insanely bad

Open diwu-sf opened this issue 9 months ago • 1 comments

I know that the documentation makes it clear that executemany is a naive for loop:

No optimizations of the query (like batching) will be performed.

But it's 2025 and please have a more optimized executemany that just issues a single SQL statement using VALUES(...) or something so this is actually usable in a data pipeline. Otherwise, trying to use the databricks SQL connector to write any non-trivial dataframe to a delta table is pointless.

Thanks!

May 19 '25 00:05 diwu-sf

For reference, the ODBC databricks driver with cursor.fast_executemany = True can do ~140 rows / s. Still not great, but better than what this SQL connector can do

Jun 12 '25 01:06 diwu-sf