wfdb-python icon indicating copy to clipboard operation
wfdb-python copied to clipboard

Enhancement: allow iterating signals in chunks of dataframes

Open thomasdziedzic-calmwave opened this issue 3 years ago • 3 comments

I'm using the new to_dataframe() function that was implemented in https://github.com/MIT-LCP/wfdb-python/pull/380

One issue that I'm seeing is that when loading some of the waveform signals from https://physionet.org/content/mimic3wdb-matched/1.0/ using to_dataframe() it eats up a lot of memory. Specifically, on the machine I'm running on which has 96gb of memory, reading the record and calling to_dataframe runs out of memory.

I would like to lazy load the signal data into a chunked dataframe which would allow me to process the waveform signals in parts that could fit into memory, rather than loading it all into memory.

thomasdziedzic-calmwave avatar Nov 29 '22 19:11 thomasdziedzic-calmwave

I accomplished this by reading the header of the record, getting the signal length, and then building my own chunking process, by using rdrecord(sigfrom, sigto) which unblocks me, but before I close this, might be worth discussing what they think the solution is and if there should be a documented solution to this problem or approach.

thomasdziedzic-calmwave avatar Nov 30 '22 14:11 thomasdziedzic-calmwave

Thanks @thomasdziedzic-calmwave, let's keep this issue open. I think it would be good to try to address the problem directly, perhaps as an argument to to_dataframe().

tompollard avatar Nov 30 '22 15:11 tompollard

Should we consider adopting Dask dataframes? My understanding is that they are better able to handle datasets that are too large for RAM: https://docs.dask.org/en/stable/

tompollard avatar Nov 30 '22 15:11 tompollard