sqlmesh icon indicating copy to clipboard operation
sqlmesh copied to clipboard

Support concurrent batch evaluation for INCREMENTAL_BY_TIME_RANGE models in Snowflake

Open izeigerman opened this issue 10 months ago • 2 comments

Because SQLMesh uses the delete+insert strategy for INCREMENTAL_BY_TIME_RANGE models, and because Snowflake acquires a table-level lock for the entire duration of the transaction when a DELETE operation is executed (source), this results in a complete lack of concurrency when evaluating individual batches.

The goal is to find a way to support concurrent batch evaluation in Snowflake.

izeigerman avatar May 01 '25 04:05 izeigerman

One idea worth exploring is to include the time column into tables clustering. The theory is that this way, micro-partitions will contain a continuous range of timestamps, and therefore the lock will only need to be acquired for a limited number of partitions rather than the entire table.

izeigerman avatar May 01 '25 04:05 izeigerman

clustering is efficient for reads but very expensive for writes, snowflake charges you whenever it has to recluster things. so something to watch out for

The more frequently a table is queried, the more benefit clustering provides. However, the more frequently a table changes, the more expensive it will be to keep it clustered. Therefore, clustering is generally most cost-effective for tables that are queried frequently and do not change frequently.

tobymao avatar May 01 '25 05:05 tobymao

The clustering key is a performance optimization and does not appear to have any effect on locking.

I ran a test that went like:

  • Create table (id, day) clustered by day
  • Insert some data for some days, 2024-01-01 and 2024-01-02
  • Begin transaction T1
  • Delete from table where day = '2024-01-01'. Statement completes but transaction T1 is still uncommitted.
  • Begin transaction T2
  • SHOW LOCKS - shows that T1 has a PARTITIONS lock on the table
  • Delete from table where day = '2024-01-02' (note: a different day than transaction T1)
  • ...(blocked)...
  • Commit T1
  • T2 Delete is immediately unblocked and executes as expected
  • Commit T2

Final result is as expected but the second a DELETE is issued, no other transaction can also issue a DELETE on that table until the first transaction has completed.

However, INSERT statements in each transaction appear to be able to run concurrently. You can issue an INSERT in transaction T2 without it being blocked even if transaction T1 had issued a DELETE on the table. I guess under the hood Snowflake is appending which doesn't require a lock.

I can think of a couple of workarounds:

  • DELETE everything up front in a single operation outside of the batches (or maybe in the first batch) and then just run the INSERTs in parallel for the remaining batches. Downside is that each batch stops being self contained and I think it might require some hacks in the snapshot evaluator but at least it should enable parallel loading
  • Run a SELECT COUNT(*) first to check if the range is empty and only issue a DELETE if the range contains data. This would only work for the initial load / backfill and for loading new data on cadence runs. Backfilling old data will have the same problem of having to DELETE first.

I still think the DELETE+INSERT strategy is correct because:

  • Snowflake hides away partitioning from you so there is no way to target specific partitions for replacement
  • INSERT OVERWRITE doesnt work because it overwrites the entire table, it doesn't just overwrite a specified data range.
  • MERGE doesn't work because it doesn't support WHEN NOT MATCHED BY SOURCE which means we cant track when a row is removed from the source table.

erindru avatar May 05 '25 06:05 erindru

Great analysis! I suspected as much but I figured it was worth a try.

DELETE everything up front in a single operation outside of the batches (or maybe in the first batch) and then just run the INSERTs in parallel for the remaining batches. Downside is that each batch stops being self contained and I think it might require some hacks in the snapshot evaluator but at least it should enable parallel loading

I think the problem with this approach is that DELETE and INSERT in this case will have to happen in separate transactions which means that external consumers will be exposed to partial data while the operation is running.

izeigerman avatar May 05 '25 16:05 izeigerman

In that case, if we can't relax SQLMesh's consistency guarantees then we cant support concurrent batch evaluation on Snowflake unless they add some kind of new feature that allows parts of the same table to be replaced in parallel.

Potentially this might work if the target is a Hybrid Table, but they come with their own set of limitations, namely they:

  • are for transactional workloads (not analytical), I suspect there is a Postgres database or something in the backend
  • only work for Snowflake AWS customers at the time of this writing
  • are limited to 2TB of storage

erindru avatar May 08 '25 22:05 erindru