enso icon indicating copy to clipboard operation
enso copied to clipboard

Uploading data into Snowflake is slow

Open radeusgd opened this issue 1 year ago • 3 comments

I've tried using select_into_database_table on a single-column, 50k row table.

In Postgres this takes roughly 0.7-2.2s, in SQLite also about 0.8s, but in Snowflake it took more than 5 minutes!

We are currently using batched insert queries with each batch consisting of 100 rows.

Perhaps we should tweak the batch size for Snowflake. OR even better, we could use Snowflake's data loading features: PUT and COPY INTO that allows uploading whole CSV files to the database. We could save the table to temporary CSV file and upload it using this mechanism - very likely it may be much more efficient.

radeusgd avatar Aug 14 '24 14:08 radeusgd

This should be a general solution for Postgres, SQL Server, and Snowflake.

GregoryTravis avatar Oct 08 '24 12:10 GregoryTravis

This should be a general solution for Postgres, SQL Server, and Snowflake.

The batched insert is more or less a general solution.

The problem highlighted in this ticket is that, for Snowflake, the batched insert is slower than it should be. The PUT and COPY INTO mechanisms are Snowflake-specific mechanisms for efficient upload of bigger amounts of data that may help in this particular case.

I'm not quite sure we can find a general solution that will be fast for all backends. Is the 0.7-2.2s for 50k rows in Postgres not satisfactory? If not, we probably would need to test some baseline approach using COPY - but I'm not sure if it works for remote connection (we'd have to check).

radeusgd avatar Oct 08 '24 12:10 radeusgd

We may want to start with tweaking the batch size for each backend, to possibly try getting better performance.

radeusgd avatar Oct 08 '24 12:10 radeusgd