pramen
pramen copied to clipboard
Resilient data pipeline framework running on Apache Spark
## Background Currently, information date format and type are ignored when the metastore persistence format is 'delta'. For example, here the date format will be ignored: ```hocon pramen.metastore.tables = [...
## Background Running `count()` on a big transformation is too expensive. It can be avoided but just always executing the transformation, and then reading the record count from the metstore....
Closes #374 Closes #421 This PR adds 'incremental' as a schedule type, and mechanisms for managing offsets (experimental). Pramen `version 1.10` introduces the concept of incremental ingestion. It allows running...
## Background Partitioning of Delta Lake tables might actually worsen the efficiency of reads, especially for small tables. https://delta.io/blog/pros-cons-hive-style-partionining/ https://delta.io/blog/2023-06-03-delta-lake-z-order/ This feature is about adding a flag to make metastore...
## Background Calculating record count for non-cached transient jobs effectively doubles the calculation time. ## Feature Do not calculate record count for non-cached transient jobs. ## Example [Optional] A simple...
## Background Currently, there are 4 pipeline notification statuses: 1. Failed (no successful tasks or a fatal error) 2. Partial Success (some tasks succeeded, some failed) 3. Succeeded with warnings...
## Background Currently, Pramen supports only 2 email lists - one for successes, and one for failures. Sometimes, depending on an error, for example, emails can be routed to different...