wfdb-python
wfdb-python copied to clipboard
Handling WFDB header date and time problems
We should aim to prevent the creation of WFDB headers which aren't to specification (via the WFDB write functions). We should also provide clear messaging for errors related to already created WFDB headers (via he WFDB read functions).
Two sets of projects have had WFDB header issues related to base_date
recently:
- 2020 and 2021 Challenges. These projects used -'s instead of /'s in the
base_date
and the date was given before thebase_time
: https://github.com/MIT-LCP/wfdb-python/issues/351 . As mentioned in this issue, the WFDB tools were not used to create these files, so these errorneous headers shouldn't come as a surprise. Error message: HeaderSyntaxError: invalid syntax in record line - Cerebral projects. These projects had a year for the
base_date
and no time. This problem was initially noted in: https://github.com/MIT-LCP/wfdb-python/issues/307. More recently, another user sent an email about this issue. Error message: ValueError: time data '2006' does not match format '%d/%m/%Y' (ecg files in https://physionet.org/content/cerebral-perfusion-diabetes/1.0.0/) Error message: ValueError: unconverted data remains: 10 (ECG files in https://physionet.org/content/cded/1.0.0/)
It isn't clear how the WFDB files for 2. were created. There is some indication that MIT-LCP helped generate these files.
At the very least we should:
- Use validation checks to prevent the creation of WFDB files with these issues in our WFDB toolboxes.
- Catch these issues in all of our WFDB toolboxes and provide clear, consistent error messages.
Since we need to address multiple issues across 3 different toolboxes this issue may get messy. I will update this post to denote how updates to the Matlab and C toolboxes will be tracked once that is determined.
Finally, it would also be good if our wfdbcheck algorithm would catch these issues. This could prevent cases like those seen in the 2020, and 2021 challenges from getting through (i.e. even if a header is created outside of our toolboxes, we'd like to flag it if it isn't to specification).
ecg-arrhythmia database has a similar failure when downloading:
wfdb.dl_database("ecg-arrhythmia", self.path)
ValueError: time data '/' does not match format '%d/%m/%Y'
@nick-trivirum , thank you for bringing this to our attention. The issue in https://physionet.org/content/ecg-arrhythmia/1.0.0/ appears to be a bit different though. There are a couple of corrupted .hea
files. I have asked the author to submit a new version but haven't heard back yet. I'd encourage you to also contact the corresponding author for this project if this is something you'd like to see fixed.
There is also a similar error in MGHDB:
>>> wfdb.rdheader("mgh025", "mghdb/1.0.0/.")
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "C:\Users\sam\AppData\Local\Programs\Python\Python39\lib\site-packages\wfdb\io\record.py", line 1805, in rdheader
record_fields = _header._parse_record_line(header_lines[0])
File "C:\Users\sam\AppData\Local\Programs\Python\Python39\lib\site-packages\wfdb\io\_header.py", line 996, in _parse_record_line
record_fields["base_date"] = datetime.datetime.strptime(
File "C:\Users\sam\AppData\Local\Programs\Python\Python39\lib\_strptime.py", line 568, in _strptime_datetime
tt, fraction, gmtoff_fraction = _strptime(data_string, format)
File "C:\Users\sam\AppData\Local\Programs\Python\Python39\lib\_strptime.py", line 349, in _strptime
raise ValueError("time data %r does not match format %r" %
ValueError: time data '31/05/199' does not match format '%d/%m/%Y'
Header content:
mgh025 8 360/0.476 1533010 18:10:14 31/05/199
mgh025.dat 212 250(-320)/mV 12 0 -289 -18691 0 ECG Lead I
mgh025.dat 212 250/mV 12 0 57 -22985 0 ECG Lead II
mgh025.dat 212 250/mV 12 0 -71 26687 0 ECG Lead V
mgh025.dat 212 9.76(-1400)/mmHg 12 0 -1455 17745 0 ART
mgh025.dat 212 19.96(-1189)/mmHg 12 0 -1182 -15600 0 PAP
mgh025.dat 212 19.34(-1259)/mmHg 12 0 -1258 -28850 0 CVP
mgh025.dat 212 1000 12 0 -35 14931 0 Resp. Imp.
mgh025.dat 212 1000 12 0 -986 14233 0 CO2
#<age>: 69 <sex>: M <diagnoses>: Resection of bilateral iliac aneurysms
# PERTINENT HISTORY:
# Coronary disease
# PHARMACOLOGIC SUPPORT:
# TNG @ 100 mcg/min
# GENERAL COMMENTS:
# Stop/FFW @ 26 min
# ELECTROCARDIOGRAPHIC DATA:
# UNDERLYING RHYTHM:
# Atrial flutter with intra-ventricular conduction defect @ 54 bpm
# RHYTHM DISTURBANCES:
# ECG INTERPRETATION:
# Left atrial hypertrophy
# Non-specific sn wave changes
# Poor precordial R wave progression
# ? Old anteroseptal infarct
# Possible old inferior infarct
# TECHNICAL COMMENTS:
# Muscle tremor, agitated, out of bed
# ?Change in lead configuration
# LL and V electrode changed @ 26 min
# HEMODYNAMIC DATA:
# ART: 180/40 MEAN: 82
# PAP: 40/11 to 56/24 PCW: 11 (@ 13 min, 38 min, 43 min, 57 min)
# RAP: 8
# WAVEFORM PATTERNS:
# "Catheter whip" in PA trace
# Respiratory variation
# RESPIRATORY DATA:
# RATE: 7/16 changed to 5/16 bpm
# MODE OF VENTILATION:
# Intermittent mandatory ventilation
# CO2 RECORDING:
# On @ 26 min
@nick-trivirum , thank you for bringing this to our attention. The issue in https://physionet.org/content/ecg-arrhythmia/1.0.0/ appears to be a bit different though. There are a couple of corrupted
.hea
files. I have asked the author to submit a new version but haven't heard back yet. I'd encourage you to also contact the corresponding author for this project if this is something you'd like to see fixed.
Hi,
This problem seems to have not been solved yet. When using wfdb to read the 'WFDBRecords/01/019/JS01052. hea' record, the same error ValueError: time data '/' does not match format '%d/%m/%Y'
still occurred.
I only want to read the patient's basic information, like age
, not the datetime
information. So, I don need to check the datetime
. What can I do to achieve such a result.