wfdb-python icon indicating copy to clipboard operation
wfdb-python copied to clipboard

Accessing a file in a subfolder on PhysioNet using pn_dir

Open alistairewj opened this issue 7 months ago • 2 comments

I'm using wfdb 4.1.2, and when I try to run this code it fails with a 404:

import wfdb
record = wfdb.rdrecord('p1003/p10030753/s40735970/40735970', pn_dir='mimic-iv-ecg')

NetFileNotFoundError: 404 Error: Not Found for url: https://physionet.org/files/mimic-iv-ecg/1.0/40735970.hea

Looks like this relates to hacky path parsing which there is an open (semi-complete) PR for (#346).

I also tried:

import wfdb
record = wfdb.rdrecord('40735970', pn_dir='mimic-iv-ecg/1.0/p1003/p10030753/s40735970')

NetFileNotFoundError: 404 Error: Not Found for url: https://physionet.org/files/mimic-iv-ecg/1.0/p1003/p10030753/s40735970/40735970.hea

So I played with the path, and this ends up working:

import wfdb
record = wfdb.rdrecord('40735970', pn_dir='mimic-iv-ecg/1.0/files/p1003/p10030753/s40735970')

Seems a bit unintuitive to me that you have to put files in there, am I doing something wrong?

alistairewj avatar Nov 14 '23 10:11 alistairewj

I also struggle with correctly setting the path when using rdrecord etc.

Updating the documentation and using examples from the most complicated cases would help. I find that the link to a file in the PhysioNet project page is reliable but it might not be intuitive to everyone to go there to get an idea of what pn_dir should be.

Ultimately, it'd be great if functions like get_record_list worked cleanly with the record_name and pn_dir from rdrecord etc. This isn't the case now. For a project like https://physionet.org/content/mimic4wdb/0.1.0/ with multiple levels of RECORD files, the get_record_list function returns the path found in the top-level RECORD file. If you run it again, joining the result from the top-level RECORD as part of the db_dir, you get the result for the bottom-level RECORD file (in this project anyhow). If we documented this I think it'd be helpful. However, the get_record_list never produces a clean result for record_name and pn_dir (at least not for this project and similarly designed projects, i.e. MIMIC-IV-ECG , etc). In this case it returns the last folder and the record_name, so the user has to split the string to get just the record_name. If we could update get_record_list or introduce a new function (get_pn_dir_record_name) it would be helpful.

briangow avatar Nov 14 '23 14:11 briangow

Side note, but I would like to move the logic for interacting with the PhysioNet platform into: https://github.com/MIT-LCP/physionet, so that WFDB can focus on file formats and analysis tools.

tompollard avatar Nov 16 '23 16:11 tompollard