synthcity icon indicating copy to clipboard operation
synthcity copied to clipboard

Time series with missingness

Open hojjatkarami opened this issue 10 months ago • 5 comments

Feature Description

I have developed a GAN framework for generating irregularly sampled time series with missing values, however, I cannot add it to synthcity as it does not support time series data loaders with missing values.

Do you have any solution? If not, it would be great if you plan to add it in the future.

hojjatkarami avatar Apr 08 '24 16:04 hojjatkarami

Hi @hojjatkarami, Thanks for engaging with Synthcity!

We currently consider the scope Synthcity to be that it is only for generating synthetic records from complete datasets with no missingness data. All data must be imputed in the real dataset before training and none of our models generate missing values. However, we do already support generating synthetic time series datasets from real irregular time series datasets. These such datasets could be said to theoretically contain missing time points, but the data set does not actually contain any missing values with placeholders. You just need to label the time points you have in your dataloader.

Is this the sort of thing you mean, or are you suggesting something else, like generating a dataset with missing values in it or training on a dataset with multiple features at irregular time points with some (but not all) feature values missing?

robsdavis avatar Apr 11 '24 13:04 robsdavis

Hi @robsdavis,

I am thinking about irregularly sampled time series with missingness such as clinical time series of ICU patients. So, at each time stamp, a few variables might be missing.

hojjatkarami avatar Apr 12 '24 12:04 hojjatkarami

Thanks for your response. Are you generating synthetic data from data with missingness or generating synthetic data with missingness? In either case, we currently consider that out of scope for Synthcity as it is possible to 1) impute missing values first before using Synthcity to create synthetic records or 2) create a synthetic dataset with Synthcity then retrospectively delete values to create missing data. Can I ask, what the method you have developed provides to improve on this situation?

robsdavis avatar Apr 15 '24 08:04 robsdavis

I consider generating synthetic data from data with missingness (so, no imputation is needed). Consider hourly measurements of ICU patients. In this case, the missingness rate is very high for many laboratory variables and the type of missingness is MNAR. I would like to add the following model to synthcity: https://github.com/hojjatkarami/TimEHR .

hojjatkarami avatar Apr 15 '24 14:04 hojjatkarami

Hi @robsdavis,

I have added TimEHR to a forked repo of synthcity. Could you please check the tutorial and give a feedback on that? https://github.com/hojjatkarami/synthcity/blob/timehr/tutorials/plugins/time_series/TimEHR/plugin_timehr.ipynb

hojjatkarami avatar May 06 '24 13:05 hojjatkarami