incubator-wayang
incubator-wayang copied to clipboard
Feature/spark dataframes
Summary
- Add Spark Dataset/DataFrame plumbing: Parquet source/sink flag, channel conversions, optimizer cost hints.
- Document how to build dataset-backed pipelines (
README.md,guides/spark-datasets.md).
Next steps / follow-ups
- ML4All pipelines still emit/consume raw
double[]/DoubleRDDs. We should extend them to useDatasetChannels once schema handling is in place. - Text/Object sources currently produce RDD channels. A Record-backed variant (or a conversion helper) would allow dataset output without extra user code.