pramen
pramen copied to clipboard
Use more effective record count when JDBC source is used with an SQL query
Background
This idea is reported by @filiphornak
Currently the record count is calculated this way if SQL expression (rather than table name) is used as an input to the JDBC source: https://github.com/AbsaOSS/pramen/blob/0f8040a8bad151eccf5a6ee3403b2ae9c6a24b9e/pramen/core/src/main/scala/za/co/absa/pramen/core/reader/TableReaderJdbc.scala#L129-L129
This is not always effective since Spark does not always can get the record count without fetching all records.
Feature
Use more effective record count when JDBC source is used with an SQL query.
Proposed Solution
SELECT COUNT(*) AS CNT FROM (${query})