paracel
paracel copied to clipboard
Support More I/O Data Formats
Allow more file formats such as gzip/bzip2(pigz/pbzip2), internal sequence file format. Besides, could paracel add support for loading data from other external storage such as a key-value database, a relational database or a low-level specified distributed file system?
A good way to achieve this is to think pluggable abstract: file system level wrapper + format level wrapper. The pluggable feature makes it easy for code reuse and keeps it simple for developers to add a new file system/format. For example, file system could include S3, HDFS, MFS, mysql, postgres, redis, hive and so on while format could include csv, gzip, snappy, orc and so on.