pramen icon indicating copy to clipboard operation
pramen copied to clipboard

#374 Incremental Ingestion

Open yruslan opened this issue 5 months ago • 2 comments

Closes #374 Closes #421

This PR adds 'incremental' as a schedule type, and mechanisms for managing offsets (experimental).

Pramen version 1.10 introduces the concept of incremental ingestion. It allows running a pipeline multiple times a day without reprocessing data that was already processed. In order to enable it, use incremental schedule when defining your ingestion operation:

schedule = "incremental"

In order for the incremental ingestion to work you need to define a monotonically increasing field, called an offset. Usually, this incremental field can be a counter, or a record creation timestamp. You need to define the offset field in your source. The source should support incremental ingestion in order to use this mode.

offset.column {
  name = "created_at"
  type = "datetime"
}

Offset types available at the moment:

Type Description
integral Any integral type (short, int, long)
datetime A datetime or timestamp fields
string Only string / varchar(n) types.

Only ingestion jobs support incremental schedule at the moment. Incremental transformations and sinks are planned to be available soon.

yruslan avatar Sep 18 '24 13:09 yruslan