sling-cli icon indicating copy to clipboard operation
sling-cli copied to clipboard

Add support for custom SQL flexibility in backfill mode

Open royandhika opened this issue 8 months ago • 2 comments

Based on the Sling documentation and GitHub discussions, I couldn't find a way to perform backfill replication using customSQL mode with the same flexibility as the {incremental_value} placeholder in incremental mode. Source: slingdocs

From what I’ve seen, backfill mode only supports {incremental_where_cond}, which is automatically constructed as:

WHERE {update_key} >= {start_value} AND {update_key} <= {end_value}

Source: #41

Based on that statement, I tried to write the condition manually inside the custom SQL query like this:

defaults:
  object: '{ENV_SCHEMA}_dl.raw_{stream_table}'
  mode: backfill
  primary_key: [id]
  update_key: created_time
  source_options:
    range: '2024-11-10, 2024-11-11'

env:
  SLING_LOADED_AT_COLUMN: timestamp

streams:
  schema.table:
    update_key: modified_time
    sql: |
      SELECT
        *
        ,COALESCE(last_info_time, created_time) AS modified_time
      FROM {stream_schema}.{stream_table}
      WHERE COALESCE(last_info_time, created_time) >= {start_value}
        AND COALESCE(last_info_time, created_time) <= {end_value}
      ORDER BY COALESCE(last_info_time, created_time)

But this didn’t work as expected.

Would it be possible to support this as a new feature? I believe it would be helpful to have more flexibility when using backfill mode, especially when dealing with multiple {update_key} values inside custom SQL.

Thanks! #215 #41

royandhika avatar Apr 08 '25 04:04 royandhika

Hey @royandhika, have you tried a CTE?

with source as(
      SELECT
        *, COALESCE(last_info_time, created_time) AS modified_time
      FROM {stream_schema}.{stream_table}
)
select * from source
WHERE {incremental_where_cond}
ORDER BY modified_time

flarco avatar Apr 08 '25 18:04 flarco

Hey @royandhika, have you tried a CTE?

with source as( SELECT *, COALESCE(last_info_time, created_time) AS modified_time FROM {stream_schema}.{stream_table} ) select * from source WHERE {incremental_where_cond} ORDER BY modified_time

Wow, yeah, never really thought of that! I tried it and it actually works! Still, I think having able to custom {start_value} and {end_value} doesn’t seem like a bad idea at all. Thanks for the CTE tip!

royandhika avatar Apr 10 '25 06:04 royandhika

{start_value} and {end_value} implemented a while back. Closing

flarco avatar Jun 24 '25 23:06 flarco