Make wfdb.rdann, wfdb.rdrecord, wfdb.rdsamp also accept pathlib.Path objects as the input record_name
Currently, the functions wfdb.rdann, wfdb.rdrecord, wfdb.rdsamp only accept str for the input record_name, as they have str specific operations. wfdb.rdheader currently accepts pathlib.Path objects as input record_name since the os.path module is able to handle pathlib.Path as well as str type input.
For example, on a linux machine
import pathlib, wfdb
sp = "/home/wenhao/Jupyter/wenhao/data/CPSC2021/training_I/data_0_1"
pp = pathlib.Path(sp)
wfdb.rdrecord(pp)
would raise error
AttributeError Traceback (most recent call last)
<ipython-input-6-754a0c10c458> in <module>
----> 1 wfdb.rdrecord(pp)
~/.local/lib/python3.6/site-packages/wfdb/io/record.py in rdrecord(record_name, sampfrom, sampto, channels, physical, pn_dir, m2s, smooth_frames, ignore_skew, return_res, force_channels, channel_names, warn_empty)
3426 pn_dir = posixpath.join(dir_list[0], get_version(dir_list[0]), *dir_list[1:])
3427
-> 3428 if record_name.endswith('.edf'):
3429 record = edf2mit(record_name, pn_dir=pn_dir, record_only=True)
3430 elif record_name.endswith('.wav'):
AttributeError: 'PosixPath' object has no attribute 'endswith'
Sometimes one would like to work locally with pathlib, which was introduced in Python 3.4 (PEP 428), to better handle the file paths.
Thank you! This looks like a great improvement.
I'm not entirely familiar with pathlib so I'm a little unsure of some of the details.
One thing to be wary of is that "file names" or "record names" might be either filesystem paths or URL paths. Some existing functions permit record_name to include a subdirectory when streaming data from PhysioNet; other functions don't allow that, and silently ignore any directory name in record_name (which I consider a bug.)
Here, for example, it looks like you're passing a pathlib.Path object to download._stream_annotation. Is that a good idea or not? I guess that posixpath.join will automatically convert the argument back into a string, but will it correctly use slashes if running on a Windows platform? And does that mean that rdann would now accept backslashes in URLs, only on Windows?
If we do want the download functions to accept a pathlib.Path as an alternative to str then that should be documented.
I'm not necessarily opposed to either saying "pathlib.Path may be used only for local files" or saying "pathlib.Path may be used either for local files or for URLs", but I do think it's important to handle URL arguments in the same way on all hosts.
Thank you! This looks like a great improvement.
I'm not entirely familiar with
pathlibso I'm a little unsure of some of the details.One thing to be wary of is that "file names" or "record names" might be either filesystem paths or URL paths. Some existing functions permit
record_nameto include a subdirectory when streaming data from PhysioNet; other functions don't allow that, and silently ignore any directory name inrecord_name(which I consider a bug.)Here, for example, it looks like you're passing a
pathlib.Pathobject todownload._stream_annotation. Is that a good idea or not? I guess thatposixpath.joinwill automatically convert the argument back into a string, but will it correctly use slashes if running on a Windows platform? And does that mean that rdann would now accept backslashes in URLs, only on Windows?If we do want the
downloadfunctions to accept apathlib.Pathas an alternative tostrthen that should be documented.I'm not necessarily opposed to either saying "
pathlib.Pathmay be used only for local files" or saying "pathlib.Pathmay be used either for local files or for URLs", but I do think it's important to handle URL arguments in the same way on all hosts.
You're correct. On a Windows system, if you pass a Path object (actually a WindowsPath object) for the file_name, then error might occur when this file_name includes a subdirectory for
url = posixpath.join(config.db_index_url, pn_dir, file_name)
in the function download._stream_annotation. Such error can be avoided by forcing converting file_name to a PosixPath using pathlib.PurePosixPath as
url = posixpath.join(config.db_index_url, pn_dir, pathlib.PurePosixPath(file_name))
A drawback of pathlib is that it cannot handle the URLs correctly (mainly the double slashes), for example
>>> pathlib.PurePosixPath('https://physionet.org/files/')
PurePosixPath('https:/physionet.org/files')
>>> db_index_url = 'https://physionet.org/files/'
>>> sp = "./temp/13918_AV_psd.mat"
>>> pp = pathlib.Path(sp)
>>> db_index_url / pathlib.PurePosixPath("mitdb" / pp)
PurePosixPath('https:/physionet.org/files/mitdb/temp/13918_AV_psd.mat')
The following image is an example tested on a Windows machine

One thing to be wary of is that "file names" or "record names" might be either filesystem paths or URL paths. Some existing functions permit record_name to include a subdirectory when streaming data from PhysioNet; other functions don't allow that, and silently ignore any directory name in record_name (which I consider a bug.)
I also think this is a bug:
>>> wfdb.rdrecord("x_mitdb/x_111", pn_dir="mitdb", sampto=2000)
NetFileNotFoundError: 404 Error: Not Found for url: https://physionet.org/files/mitdb/1.0.0/x_111.hea
>>> wfdb.rdrecord("x_111", pn_dir="mitdb/1.0.0/x_mitdb/", sampto=2000)
<wfdb.io.record.Record at 0x7f6cd4ee7c50>
I think there's a lot of work to do to better deal with the paths. For example, I found currently wfdb detects whether a pn_dir contains the version number using
'/' not in db_dir
or
'.' not in pn_dir
I think it's better to specify a pattern for the database version and use re.search to detect it
DB_VERSION_PATTERN = re.compile("\d+.\d+\.\d+")
Thanks very much for identifying the bugs. We're going to think a bit more about how we want the API to be in the next major version, such that the behavior is more explicit. Especially when adding cloud data sources.
Thanks very much for identifying the bugs. We're going to think a bit more about how we want the API to be in the next major version, such that the behavior is more explicit. Especially when adding cloud data sources.
I agree with you. I tried continuing to correct (enhance) path operations in other places for several hours but finally gave up. I hope that path operations would be designed in a more uniform and modern way in the next major version.