Ability to add columns to bronze tables similar to silver table query
It's not an issue but a feature request. This would be useful if we want to add the name of the source file name or put in processing time. Example: val df = spark.readStream.format("cloudFiles") .schema(schema) .option("cloudFiles.format", "csv") .option("cloudFiles.region","ap-south-1") .load("path") .withColumn("filePath",input_file_name())
I did this in my own cloned branch. Inside of read_dlt_cloud_files:
spark.readStream.format(bronze_dataflow_spec.sourceFormat)
.options(**reader_config_options)
.schema(schema)
.load(source_path)
.withColumn("_filePath",input_file_name())
.withColumn("_loadDate",lit(datetime.now()))
I thought about making a pull request for this but didn't want it to include those two columns every time and wasn't sure of the most elegant way of implementing giving the user the option of including the two additional columns.
We can add bring your own transformations functionality so that you can add columns
@Lackshu @WilliamMize Would this be done now with last release supporting below features: Added support for file metadata columns for autoloader Bring your own custom transformations for bronze/silver layer
@ravi-databricks, the support for file metadata columns would works for my scenario. Thanks.
@ravi-databricks Thank you, I have implemented 0.0.8 and this now supports our use case.