wfdb-python icon indicating copy to clipboard operation
wfdb-python copied to clipboard

New interface for specifying different data sources for read/write

Open cx1111 opened this issue 2 years ago • 0 comments

Looking at the current rdrecord for example, there are two parameters used to specify the location of the record:

  1. record_name : str
  2. pn_dir : str

The current package supports reading files locally and from the global database index URL, which defaults to PhysioNet, as specified in download.py.

There are several things that we should aim to support:

  • Reading/writing from more types of data sources, such as S3, and GCS.
  • Having more than one remote source configured at a time.

One proposal might be to have a new DataSource class, and a global config dictionary with key:value pairs of ds_name(str):ds(DataSource). ie.

class DataSourceType(Enum):
    LOCAL = 1  # Not sure if this is necessary?
    HTTP = 2
    GCS = 3
    S3 = 4

class DataSource:
    ds_type : DataSourceType
    # Other type-specific params here
    
_physionet_ds = DataSource(ds_type=DataSourceType.HTTP, base_url="https://physionet.org/content/")

data_sources = { 'physionet' : _physionet_ds }

And the read/write functions could use these params:

  1. record_name: str
  2. data_source: str | DataSource - The key of the data source in the global data sources map, or a DataSource object.

This would be much more explicit. Thoughts?

cx1111 avatar Apr 30 '22 01:04 cx1111