wfdb-python
wfdb-python copied to clipboard
Make wfdb.rdann, wfdb.rdrecord, wfdb.rdsamp also accept pathlib.Path objects as the input record_name
Currently, the functions wfdb.rdann
, wfdb.rdrecord
, wfdb.rdsamp
only accept str
for the input record_name
, as they have str
specific operations. wfdb.rdheader
currently accepts pathlib.Path
objects as input record_name
since the os.path
module is able to handle pathlib.Path
as well as str
type input.
For example, on a linux machine
import pathlib, wfdb
sp = "/home/wenhao/Jupyter/wenhao/data/CPSC2021/training_I/data_0_1"
pp = pathlib.Path(sp)
wfdb.rdrecord(pp)
would raise error
AttributeError Traceback (most recent call last)
<ipython-input-6-754a0c10c458> in <module>
----> 1 wfdb.rdrecord(pp)
~/.local/lib/python3.6/site-packages/wfdb/io/record.py in rdrecord(record_name, sampfrom, sampto, channels, physical, pn_dir, m2s, smooth_frames, ignore_skew, return_res, force_channels, channel_names, warn_empty)
3426 pn_dir = posixpath.join(dir_list[0], get_version(dir_list[0]), *dir_list[1:])
3427
-> 3428 if record_name.endswith('.edf'):
3429 record = edf2mit(record_name, pn_dir=pn_dir, record_only=True)
3430 elif record_name.endswith('.wav'):
AttributeError: 'PosixPath' object has no attribute 'endswith'
Sometimes one would like to work locally with pathlib
, which was introduced in Python 3.4 (PEP 428), to better handle the file paths.
Thank you! This looks like a great improvement.
I'm not entirely familiar with pathlib
so I'm a little unsure of some of the details.
One thing to be wary of is that "file names" or "record names" might be either filesystem paths or URL paths. Some existing functions permit record_name
to include a subdirectory when streaming data from PhysioNet; other functions don't allow that, and silently ignore any directory name in record_name
(which I consider a bug.)
Here, for example, it looks like you're passing a pathlib.Path
object to download._stream_annotation
. Is that a good idea or not? I guess that posixpath.join
will automatically convert the argument back into a string, but will it correctly use slashes if running on a Windows platform? And does that mean that rdann would now accept backslashes in URLs, only on Windows?
If we do want the download
functions to accept a pathlib.Path
as an alternative to str
then that should be documented.
I'm not necessarily opposed to either saying "pathlib.Path
may be used only for local files" or saying "pathlib.Path
may be used either for local files or for URLs", but I do think it's important to handle URL arguments in the same way on all hosts.
Thank you! This looks like a great improvement.
I'm not entirely familiar with
pathlib
so I'm a little unsure of some of the details.One thing to be wary of is that "file names" or "record names" might be either filesystem paths or URL paths. Some existing functions permit
record_name
to include a subdirectory when streaming data from PhysioNet; other functions don't allow that, and silently ignore any directory name inrecord_name
(which I consider a bug.)Here, for example, it looks like you're passing a
pathlib.Path
object todownload._stream_annotation
. Is that a good idea or not? I guess thatposixpath.join
will automatically convert the argument back into a string, but will it correctly use slashes if running on a Windows platform? And does that mean that rdann would now accept backslashes in URLs, only on Windows?If we do want the
download
functions to accept apathlib.Path
as an alternative tostr
then that should be documented.I'm not necessarily opposed to either saying "
pathlib.Path
may be used only for local files" or saying "pathlib.Path
may be used either for local files or for URLs", but I do think it's important to handle URL arguments in the same way on all hosts.
You're correct. On a Windows system, if you pass a Path
object (actually a WindowsPath
object) for the file_name
, then error might occur when this file_name
includes a subdirectory for
url = posixpath.join(config.db_index_url, pn_dir, file_name)
in the function download._stream_annotation
. Such error can be avoided by forcing converting file_name
to a PosixPath
using pathlib.PurePosixPath
as
url = posixpath.join(config.db_index_url, pn_dir, pathlib.PurePosixPath(file_name))
A drawback of pathlib
is that it cannot handle the URLs correctly (mainly the double slashes), for example
>>> pathlib.PurePosixPath('https://physionet.org/files/')
PurePosixPath('https:/physionet.org/files')
>>> db_index_url = 'https://physionet.org/files/'
>>> sp = "./temp/13918_AV_psd.mat"
>>> pp = pathlib.Path(sp)
>>> db_index_url / pathlib.PurePosixPath("mitdb" / pp)
PurePosixPath('https:/physionet.org/files/mitdb/temp/13918_AV_psd.mat')
The following image is an example tested on a Windows machine
One thing to be wary of is that "file names" or "record names" might be either filesystem paths or URL paths. Some existing functions permit record_name to include a subdirectory when streaming data from PhysioNet; other functions don't allow that, and silently ignore any directory name in record_name (which I consider a bug.)
I also think this is a bug:
>>> wfdb.rdrecord("x_mitdb/x_111", pn_dir="mitdb", sampto=2000)
NetFileNotFoundError: 404 Error: Not Found for url: https://physionet.org/files/mitdb/1.0.0/x_111.hea
>>> wfdb.rdrecord("x_111", pn_dir="mitdb/1.0.0/x_mitdb/", sampto=2000)
<wfdb.io.record.Record at 0x7f6cd4ee7c50>
I think there's a lot of work to do to better deal with the paths. For example, I found currently wfdb
detects whether a pn_dir
contains the version number using
'/' not in db_dir
or
'.' not in pn_dir
I think it's better to specify a pattern for the database version and use re.search
to detect it
DB_VERSION_PATTERN = re.compile("\d+.\d+\.\d+")
Thanks very much for identifying the bugs. We're going to think a bit more about how we want the API to be in the next major version, such that the behavior is more explicit. Especially when adding cloud data sources.
Thanks very much for identifying the bugs. We're going to think a bit more about how we want the API to be in the next major version, such that the behavior is more explicit. Especially when adding cloud data sources.
I agree with you. I tried continuing to correct (enhance) path operations in other places for several hours but finally gave up. I hope that path operations would be designed in a more uniform and modern way in the next major version.