porcupine
porcupine copied to clipboard
Investigate other exchange formats for config files
For now, our config file can be in JSON/YAML which the pipeline can automatically generate. That is nice, but porcupine (due to its inclusive philosophy) could attract other people by supporting other formats, I'm thinking about:
- TOML (would be easy to add, and nice for pipelines with light configuration. Although it wouldn't fit workflows where we want to embed arbitrary input data in the config file --see https://github.com/tweag/porcupine/issues/47 for some details--, because TOML is too flat. So it would require some thinking)
- Avro/Thrift/Protobuf (would require more work, but given the virtual tree contains all the information about types it is already possible to do so, and it would really enhance a pipeline's capacity to be called from an external tool)
- Apache Arrow (related to https://github.com/tweag/porcupine/issues/9, this could be useful for pipelines using a lot of data which could actually be packed in one big parquet/arrow dataset)