dlt-meta icon indicating copy to clipboard operation
dlt-meta copied to clipboard

Ability to add columns to bronze tables similar to silver table query

Open Lackshu opened this issue 2 years ago • 2 comments

It's not an issue but a feature request. This would be useful if we want to add the name of the source file name or put in processing time. Example: val df = spark.readStream.format("cloudFiles") .schema(schema) .option("cloudFiles.format", "csv") .option("cloudFiles.region","ap-south-1") .load("path") .withColumn("filePath",input_file_name())

Lackshu avatar Oct 03 '23 21:10 Lackshu

I did this in my own cloned branch. Inside of read_dlt_cloud_files:

spark.readStream.format(bronze_dataflow_spec.sourceFormat) .options(**reader_config_options) .schema(schema)
.load(source_path) .withColumn("_filePath",input_file_name()) .withColumn("_loadDate",lit(datetime.now()))

I thought about making a pull request for this but didn't want it to include those two columns every time and wasn't sure of the most elegant way of implementing giving the user the option of including the two additional columns.

WilliamMize avatar Oct 05 '23 12:10 WilliamMize

We can add bring your own transformations functionality so that you can add columns

ravi-databricks avatar Oct 16 '23 16:10 ravi-databricks

@Lackshu @WilliamMize Would this be done now with last release supporting below features: Added support for file metadata columns for autoloader Bring your own custom transformations for bronze/silver layer

ravi-databricks avatar Sep 13 '24 18:09 ravi-databricks

@ravi-databricks, the support for file metadata columns would works for my scenario. Thanks.

Lackshu avatar Sep 17 '24 00:09 Lackshu

@ravi-databricks Thank you, I have implemented 0.0.8 and this now supports our use case.

WilliamMize avatar Oct 23 '24 18:10 WilliamMize