paracel icon indicating copy to clipboard operation
paracel copied to clipboard

Support More I/O Data Formats

Open xunzhang opened this issue 10 years ago • 1 comments

Allow more file formats such as gzip/bzip2(pigz/pbzip2), internal sequence file format. Besides, could paracel add support for loading data from other external storage such as a key-value database, a relational database or a low-level specified distributed file system?

xunzhang avatar Dec 18 '15 10:12 xunzhang

A good way to achieve this is to think pluggable abstract: file system level wrapper + format level wrapper. The pluggable feature makes it easy for code reuse and keeps it simple for developers to add a new file system/format. For example, file system could include S3, HDFS, MFS, mysql, postgres, redis, hive and so on while format could include csv, gzip, snappy, orc and so on.

xunzhang avatar Jun 30 '16 12:06 xunzhang