public-datasets-pipelines
public-datasets-pipelines copied to clipboard
Support CSV column renaming without having to use custom scripts
Description
CSV column name remapping is a very common use case for data transforms.
To prevent contributors from having to keep writing a custom script to rename CSV columns, we can have a column name mapping feature built right into the YAML config file.
We can represent it as an operator such as:
dag:
tasks:
- operator: "RenameCSVHeadersOperator"
source_csv: "gs://bucket/path/to/file.csv"
mappings:
- old_header: "DATE"
new_header: "date"
- old_header: "FACILITY_NAME"
new_header: "facility_name"
Checklist
- [x] I created this issue in accordance with the Code of Conduct.
- [x] This issue is appropriately labeled.