Zach
Zach
Hello @ericabertone, I guess the "small" dataframe works because it's run on the driver and therefore is not serialized to the executors.
In the description we have the following open point: how do we override the connection configuration used for Actions when executed on a remote agent? I have the following suggestion:...
Hi @Geheiner, thanks for the updated ideas. Some thoughts from my side. 1) I propose to introduce new top level objects RemoteAgent (or just Agent?) which hold the configuration how...
We should also support Azure Synapse: - the jdbc driver above might work - for dedicated Synapse SQL pools there is an optimized connector: https://github.com/MicrosoftDocs/azure-docs.de-de/blob/master/articles/synapse-analytics/spark/synapse-spark-sql-pool-import-export.md
Alternatively this could be implemented as Transformer, but it would need the primary key of the output data object. This could be achieved by adding the current action to the...
It would be better to configure rank cols/expressions in order to control which duplicate records are discarded...
Implemented allowSchemaEvolution property for JdbcTableDataObject and DeltaLakeTableDataObject. Should we implement it aswell for HiveTableDataObject (only relevant with SaveMode.overwrite)?
"Define expressions to check thresholds" could be implemented using #377 for Spark. From a performance perspective this would be optimal.
Is probably also linked with #43
To be discussed in our weekly... Missing point from my side: - create a docu site for data quality? - check integration of original spark metrics.