cloudquery icon indicating copy to clipboard operation
cloudquery copied to clipboard

feat: `_cq_sync_time` for long lived processes

Open bbernays opened this issue 2 years ago • 4 comments

Which problem is this feature request solving?

Currently for long lived syncing processes like AWS event-based sync, the _cq_sync_time reflects the time that the process began not which is much less meaningful when syncs can last days or weeks.

Describe the solution you'd like

There are a few options:

  1. Enable the plugin to set the _cq_sync_time and have the CLI respect that value and not add another column/field
  2. Add a _cq_last_updated timestamp value that represents the last time the record was updated. For append mode this column would never be updated, but for update and delete-stale it would be updated every time the record is updated. The Value of the timestamp would either be set by the destination plugin at the time it was added to the batch

Pull request (optional)

  • [ ] I can submit a pull request

bbernays avatar Dec 13 '23 22:12 bbernays

@marianogappa - Can the new basic transformation plugin be used to inject the timestamp?

bbernays avatar Aug 09 '24 15:08 bbernays

Not with the basic transformer as it stands, because it can only add literal string columns.

I'm working on an advanced transformer (hopefully PoC early next week) that might enable this.

I'm not too familiar with the use case; in principle it sounds like it should be more of a built in thing rather than something to be solved via a transformer. Could you define the requirement further?

marianogappa avatar Aug 09 '24 16:08 marianogappa

If the transformer could inject a new column named _cq_last_updated where the value is the current time, that would be enoug

bbernays avatar Aug 09 '24 16:08 bbernays

I think this can be done via a custom transformer. Users can write their own transformer to inject whatever they want.

erezrokah avatar Aug 27 '24 13:08 erezrokah

The basic transformation plugin, now allows users to add a column whose value is the value at the time that the transformation plugin processes the data.

This means that users of the event-based-sync can have a timestamp in their database that corresponds to the time at which a record was synced. Here is an example of the configuration (transformation and destination):

kind: destination
spec:
  name: "postgresql"
  path: "cloudquery/postgresql"
  registry: "cloudquery"
  version: "v8.2.8"
  # Learn more about the configuration options at https://cql.ink/postgresql_destination
  transformers:
    - basic # we add the basic transformer here
  spec:
    connection_string: "postgres://postgres:pass@localhost:5432/postgres?sslmode=disable" # set the environment variable in a format like 
---
kind: transformer
spec:
  name: "basic"
  registry: cloudquery
  path: "cloudquery/basic"
  version: "v1.1.0"
  spec:
    transformations:
      - kind: add_current_timestamp_column
        tables: ["*"]
        name: "_record_processed_at"

bbernays avatar Sep 13 '24 21:09 bbernays