porcupine icon indicating copy to clipboard operation
porcupine copied to clipboard

Investigate other exchange formats for config files

Open YPares opened this issue 5 years ago • 0 comments

For now, our config file can be in JSON/YAML which the pipeline can automatically generate. That is nice, but porcupine (due to its inclusive philosophy) could attract other people by supporting other formats, I'm thinking about:

  • TOML (would be easy to add, and nice for pipelines with light configuration. Although it wouldn't fit workflows where we want to embed arbitrary input data in the config file --see https://github.com/tweag/porcupine/issues/47 for some details--, because TOML is too flat. So it would require some thinking)
  • Avro/Thrift/Protobuf (would require more work, but given the virtual tree contains all the information about types it is already possible to do so, and it would really enhance a pipeline's capacity to be called from an external tool)
  • Apache Arrow (related to https://github.com/tweag/porcupine/issues/9, this could be useful for pipelines using a lot of data which could actually be packed in one big parquet/arrow dataset)

YPares avatar Oct 16 '19 09:10 YPares