wfdb-python Feature: Parquet Support

Feature: Parquet Support

Open ikarosilva opened this issue 2 years ago • 1 comments

Extend WFDB-python to include support for Apache Parquet:

https://arrow.apache.org/docs/cpp/parquet.html

Given it's binary and there are c++ libraries out there, I thought it might be easy to at least support some subset of Parquet (or in fact, just generate a header for parquet files that makes a subset of parquet files WFDB compatible).

Benefit: Facilitate compute in AWS and Spark for large datasets

Jun 10 '22 16:06 ikarosilva

Since this is a WFDB format request, not a feature request for this package, can you please move the issue to: https://github.com/wfdb/wfdb-spec/

It would be good to add more details regarding:

What files would be in Parquet format. The dat files only?
How this would compare to the current binary files and FLAC files. For instance, if we stored the current binary dat files on an HDFS-compatible system such as S3, we could already read them with Spark.
Would this work with stream read/write, and random access?

Jun 15 '22 04:06 cx1111

wfdb-python wfdb-python copied to clipboard

Feature: Parquet Support

wfdb-python
wfdb-python copied to clipboard