distributed-dataset
distributed-dataset copied to clipboard
Utility functions to read different file formats
Currently, we expect users to write a Conduit to read data from external sources. This is quite easy, however it would be even better to provide some combinators to use common formats and storage systems; eg. JSON, CSV, gzip, parquet and HDFS, S3, HTTP.
Almost all of them already have libraries on Hackage providing Conduit's we can directly use, however it is not desirable increase our dependency footprint a lot. So, maybe we should hide them behind a flag, or create many small libraries (distributed-dataset-json, distributed-dataset-gzip e.g).
Relevant: #8