Support more file format
Is your feature request related to a problem? Please describe. Blaze only support parquet file format so far, and it is cusotmerize, but in fact datafusion have implement parquet source
Describe the solution you'd like Can we use datafusion reader interface? I think it more easier to extend, btw datafusion have provided multiple reader so far
the customized ParquetExec is designed for reading data directly from HDFS via JNI (we don't use object-store or libhdfs because they are too hard to be used in production environemnt). I don't think datafusion's Reader interface outperforms current ExecutionPlan/SendableRecordBatchStream implementation. and i'm not attracted to datafusion's builtin formats (like csv, json), as they are not widely used in spark.
yeah, I see. but it is hard to extend data source now, it doesn't have extend interface to support that. we can easy to extend datasource if we use datafusion reader something like deltalake, avro, etc
yeah, I see. but it is hard to extend data source now, it doesn't have extend interface to support that. we can easy to extend datasource if we use datafusion reader something like deltalake, avro, etc
it should be hard. different formats have lots of specialized logics of reading data, like pruning, data type converting, delimiting, and so on. i don't have any idea to design an input format interface yet.
related to #498