public-datasets-pipelines icon indicating copy to clipboard operation
public-datasets-pipelines copied to clipboard

Support CSV column renaming without having to use custom scripts

Open adlersantos opened this issue 4 years ago • 0 comments

Description

CSV column name remapping is a very common use case for data transforms.

To prevent contributors from having to keep writing a custom script to rename CSV columns, we can have a column name mapping feature built right into the YAML config file.

We can represent it as an operator such as:

dag:
  tasks:
    - operator: "RenameCSVHeadersOperator"
      source_csv: "gs://bucket/path/to/file.csv"
      mappings:
        - old_header: "DATE"
          new_header: "date"
        - old_header: "FACILITY_NAME"
          new_header: "facility_name"

Checklist

  • [x] I created this issue in accordance with the Code of Conduct.
  • [x] This issue is appropriately labeled.

adlersantos avatar May 10 '21 17:05 adlersantos