jitsu
jitsu copied to clipboard
Batch destinations with S3/GCS storages use inserts for /bulk endpoint and source syncs.
The problem
Destinations like BigQuery, Snowflake, Redshift in batch mode use GCS or S3 as an intermediately storage for batches.
Instead of multiple inserts Jitsu uses COPY (or equivalent) from file operation to process those batches. That method tends to be more time and cost effective.
However, COPY works only for events coming from /event and /s2s/event endpoints. For events coming from /bulk Jitsu generates a sequence of UPSERTs (UPSERT is a virtual operation implemented differently for different destinations).
Solution
For consistency and cost/time optimization, this should be changed. /bulk should send events through the same pipeline as /event and /s2s/event