wfdb-python icon indicating copy to clipboard operation
wfdb-python copied to clipboard

Feature: Parquet Support

Open ikarosilva opened this issue 2 years ago • 1 comments

Extend WFDB-python to include support for Apache Parquet:

https://arrow.apache.org/docs/cpp/parquet.html

Given it's binary and there are c++ libraries out there, I thought it might be easy to at least support some subset of Parquet (or in fact, just generate a header for parquet files that makes a subset of parquet files WFDB compatible).

Benefit: Facilitate compute in AWS and Spark for large datasets

ikarosilva avatar Jun 10 '22 16:06 ikarosilva

Since this is a WFDB format request, not a feature request for this package, can you please move the issue to: https://github.com/wfdb/wfdb-spec/

It would be good to add more details regarding:

  • What files would be in Parquet format. The dat files only?
  • How this would compare to the current binary files and FLAC files. For instance, if we stored the current binary dat files on an HDFS-compatible system such as S3, we could already read them with Spark.
  • Would this work with stream read/write, and random access?

cx1111 avatar Jun 15 '22 04:06 cx1111