Zach

Results 62 issues of Zach

**Is your feature request related to a problem? Please describe.** Delta Sharing is an open technology to share datasets, especially with Databricks (but not limited to Databricks). It would be...

enhancement

### What changes are included in the pull request? The Schema of a DataObject can be defined from a case class. This PR adds that the schema is enriched with...

**Is your feature request related to a problem? Please describe.** Transforming data within a Database (input/output is the same database) with Spark is inefficient, as we need to read the...

enhancement

**Describe the bug** If only a schemaMin is defined for a CsvFileDataObject, and not schema, no sample file and there is no real data files, then CsvFileDataObject will throw a...

bug

**Describe the bug** HistorizeAction with mergeModeEnable = true needs input schema in prepare phase, see the following stack trace: at io.smartdatalake.workflow.dataobject.CsvFileDataObject.getDataFrame(CsvFileDataObject.scala:68) at io.smartdatalake.workflow.action.HistorizeAction.initSaveModeOptions(HistorizeAction.scala:155) at io.smartdatalake.workflow.action.HistorizeAction.prepare(HistorizeAction.scala:189) This violates the principal that...

bug

**Is your feature request related to a problem? Please describe.** SDLB allows to define Primary/Foreign Keys for TableDataObjects, e.g. DeltaLakeTableDataObject. Primary Key is used by Deduplicate/HistorizeAction. Foreign Keys are just...

enhancement

**Is your feature request related to a problem? Please describe.** It would be interesting to analyze dependencies on column level, in order to - understand what transformations have been applied...

enhancement

**Is your feature request related to a problem? Please describe.** The existing AdditionalColumnTransformer can be used to add new Columns to a DataFrame. Often also dropping or renaming columns is...

enhancement

**Is your feature request related to a problem? Please describe.** StandardizeColNamesTransformer only handles toplevel columns of a DataFrame, but not names in nested structures. Often this should be done with...

enhancement