distributed-dataset icon indicating copy to clipboard operation
distributed-dataset copied to clipboard

Utility functions to read different file formats

Open utdemir opened this issue 6 years ago • 0 comments

Currently, we expect users to write a Conduit to read data from external sources. This is quite easy, however it would be even better to provide some combinators to use common formats and storage systems; eg. JSON, CSV, gzip, parquet and HDFS, S3, HTTP.

Almost all of them already have libraries on Hackage providing Conduit's we can directly use, however it is not desirable increase our dependency footprint a lot. So, maybe we should hide them behind a flag, or create many small libraries (distributed-dataset-json, distributed-dataset-gzip e.g).

Relevant: #8

utdemir avatar Jun 25 '19 10:06 utdemir