wfdb-python icon indicating copy to clipboard operation
wfdb-python copied to clipboard

Rounding errors in to_dataframe

Open bemoody opened this issue 1 year ago • 0 comments

The wfdb.Record.to_dataframe function generates a DataFrame from a Record object. The index of the resulting DataFrame is the elapsed or absolute time of each sample.

This code, however, will have significant rounding errors over a long record:

        if self.base_datetime is not None:
            index = pd.date_range(
                start=self.base_datetime,
                periods=self.sig_len,
                freq=pd.Timedelta(seconds=1 / self.fs),
            )
        else:
            index = pd.timedelta_range(
                start=pd.Timedelta(0),
                periods=self.sig_len,
                freq=pd.Timedelta(seconds=1 / self.fs),
            )

For example:

$ python3
>>> import wfdb
>>> r = wfdb.rdrecord('81739927', pn_dir='mimic4wdb/0.1.0/waves/p100/p10014354/81739927')
>>> str(r.base_datetime)
'2148-08-16 09:00:17.566000'
>>> r.fs
62.4725
>>> r.sig_len
6661120
>>> r.to_dataframe()
                             I     II    III      V  aVR     Pleth      Resp
2148-08-16 09:00:17.566000 NaN    NaN    NaN    NaN  NaN       NaN -0.751374
2148-08-16 09:00:17.582007 NaN    NaN    NaN    NaN  NaN       NaN -0.751374
2148-08-16 09:00:17.598014 NaN    NaN    NaN    NaN  NaN       NaN -0.751374
2148-08-16 09:00:17.614021 NaN    NaN    NaN    NaN  NaN       NaN -0.751374
2148-08-16 09:00:17.630028 NaN    NaN    NaN    NaN  NaN       NaN -0.751374
...                         ..    ...    ...    ...  ...       ...       ...
2148-08-17 14:37:22.033805 NaN -0.220 -0.285 -0.025  NaN  0.404297  0.487477
2148-08-17 14:37:22.049812 NaN -0.030  0.005  0.025  NaN  0.396484  0.530238
2148-08-17 14:37:22.065819 NaN -0.065 -0.030 -0.015  NaN  0.386475  0.574832
2148-08-17 14:37:22.081826 NaN -0.265 -0.255 -0.125  NaN  0.375977  0.621258
2148-08-17 14:37:22.097833 NaN -0.550 -0.610 -0.355  NaN  0.366211  0.664020

[6661120 rows x 7 columns]
>>> str(r.get_absolute_time(6661119)
'2148-08-17 14:37:22.384920'

$ wfdbtime -r mimic4wdb/0.1.0/waves/p100/p10014354/81739927/ s6661119
       s6661119    29:37:04.819 [14:37:22.385 17/08/2148]

Here, get_absolute_time is correct to the nearest microsecond and the wfdbtime command is correct to the nearest millisecond. to_dataframe, however, is off by 0.287 seconds.

I think this would be avoided by using start and end arguments to date_range or timedelta_range, rather than using start and freq.

bemoody avatar Mar 29 '23 21:03 bemoody