jitsu icon indicating copy to clipboard operation
jitsu copied to clipboard

Batch destinations with S3/GCS storages use inserts for /bulk endpoint and source syncs.

Open absorbb opened this issue 3 years ago • 0 comments

The problem

Destinations like BigQuery, Snowflake, Redshift in batch mode use GCS or S3 as an intermediately storage for batches. Instead of multiple inserts Jitsu uses COPY (or equivalent) from file operation to process those batches. That method tends to be more time and cost effective.

However, COPY works only for events coming from /event and /s2s/event endpoints. For events coming from /bulk Jitsu generates a sequence of UPSERTs (UPSERT is a virtual operation implemented differently for different destinations).

Solution

For consistency and cost/time optimization, this should be changed. /bulk should send events through the same pipeline as /event and /s2s/event

absorbb avatar Apr 04 '22 13:04 absorbb