pramen
pramen copied to clipboard
Resilient data pipeline framework running on Apache Spark
## Describe the bug Originally, this happened when decimal correction is used with Hive, and there are columns having decimal(38,18) types. Pramen tries to 'correct' the schema by applying a...
## Feature Improve unit test overage of ResultSetToRowIterator.scala.
## Background Currently, incremental updates are made by overwriting the latest info date partitions multiple times a day. This can be inefficient, especially for big tables with many events. If...
## Background The CDC transformer can take: - a table ingested as an initial snapshot, and then changes only - the primary key - the pre-combine key (timestamp) and transform...
## Background This is a requirement for Enceladus and Spark versions that do not support committers without copying of data. ## Feature Add support for S3 versions cleanup via a...
## Background Users want to customize SQL queries with information date based dates as part of an expression, including table names. For example: ``` SELECT * FROM my_table_202402 ``` where...
## Background When an event table is ingested initially, the history can be quite long. Executing each event date one by one can have a big overhead and execute many...
## Background This idea is reported by @filiphornak Currently the record count is calculated this way if SQL expression (rather than table name) is used as an input to the...
## Background Stability metric is computed based on the number of input and output dependencies:  - `I` - number of input dependencies - the...
## Background Currently, fixtures used for testing Pramen sources, sinks, and transformations are in the test code only, and not published as part of an artifact. It would be helpful...