mimic-code icon indicating copy to clipboard operation
mimic-code copied to clipboard

MIMIC-III Matched: how to match numeric to waveform data

Open PatriciaBota opened this issue 11 months ago • 1 comments

I am only considering files with PPG data in .dat file (e.g. p00/p000333/3092245_0040.hea), how can I match the PPG with the numeric SpO2 in e.g. 3092245n.dat?

The PPG and SPO2 have different lengths, even when accounting for sampling rate differences.

Is there any time reference I can take into account to know the file 3092245_0040 PPG time collection to match the SpO2 data in the numeric file 3092245n.dat?

Thank you very much.

PatriciaBota avatar Feb 03 '25 22:02 PatriciaBota

Hi Patricia,

From my understanding, the header file 3092245_0040.hea indicates that a PLETH (PPG) signal is available, along with ABP and lead II — for a total of three signals. The file duration is 2,794,752 samples at a sampling rate of 125 Hz, which corresponds to approximately 6 hours and 20 minutes of recording. The start time of the segment is listed as 21:59:08.504.

If you compare this with the start time found in the master header of the numerics file for record 3092245 (the file called p000333-2147-04-25-22-02n.hea), you’ll see it starts at 22:02:05 — which doesn’t match. This is expected.

This difference arises because numeric values such as SpO₂ are stored in a single, continuous numerics file (ending in n), while high-volume signals like PLETH are split into multiple waveform segments per record.

To properly synchronize the two, you need to determine the exact time offset (or equivalently, the number of samples elapsed) between the start of the overall record and the beginning of your target segment — not just rely on the segment’s local start time.

You can find this offset in the master header file (e.g. p000333-2147-04-25-22-02.hea), which lists all segments associated with the record, along with their duration in samples. If you see a ~ followed by a number, that indicates a gap in the recording, which is common between segments and should be accounted for.

To compute the time offset of your target segment (e.g. 3092245_0040):

  1. Sum all the sample counts from the beginning of the master header up to that segment (including gaps).
  2. Divide the total number of samples by the sampling rate (125 Hz) to get the elapsed time in seconds.
  3. Start reading the SpO₂ signal from the numerics file only after this time offset.

That way, the waveform segment and the numerics (SpO₂) will be aligned in time.

I’m not an expert in this domain, but I use a similar technique in my own research, and I hope this approach might be helpful to you.

Best of luck, Vincent

vmg-dev1 avatar May 28 '25 11:05 vmg-dev1