wfdb-python
wfdb-python copied to clipboard
Feature: Parquet Support
Extend WFDB-python to include support for Apache Parquet:
https://arrow.apache.org/docs/cpp/parquet.html
Given it's binary and there are c++ libraries out there, I thought it might be easy to at least support some subset of Parquet (or in fact, just generate a header for parquet files that makes a subset of parquet files WFDB compatible).
Benefit: Facilitate compute in AWS and Spark for large datasets
Since this is a WFDB format request, not a feature request for this package, can you please move the issue to: https://github.com/wfdb/wfdb-spec/
It would be good to add more details regarding:
- What files would be in Parquet format. The dat files only?
- How this would compare to the current binary files and FLAC files. For instance, if we stored the current binary dat files on an HDFS-compatible system such as S3, we could already read them with Spark.
- Would this work with stream read/write, and random access?