feature_engine
feature_engine copied to clipboard
multivariate imputation
In multivariate imputation, we estimate the values of missing data using regression or classification models based of the other variables in the data.
The iterativeimputer will allows us only to use either one of regression or classification. But often we have binary, discrete and continuous variables in our datasets. So we would like to use a suitable model for each variable to carry out the imputation.
Can we design a transformer that does exactly so?
It would either recognise binary, multilcass and continuous variables or ask the user to enter them, and then train suitable models to predict the values of the missing data, for each variably type.
Looks fun, @solegalli! I'm happy to tackle this issue. Which issue do you prefer we address first? This issue or #107?
hola @solegalli,
I see sklearn has an experimental version of the IterativeImputer. Do we still want to implement this transformer into feature-engine?
When training the transformer's estimator, will the transformer organize the non-missing values for the dependent variable as the training set and all the np.nan
values as the "test set" or values to be predicted?
Also, given there are most likely np.nan
scattered throughout the dataset, I'm assuming we should limit the estimators to models that handle np.nan
, e.g., random forest.
Hi @Morgan-Sell
The iterativeImputer will return a continuous value to impute NA. But some variables are categorical, so instead of regression, classification would be more suitable.
Nan are handle during the subsequent rounds of imputation, like the iterativeimputer does.
So I guess, the only difference would be that our imputer is able to distinguish when to do regression and when to do imputation. Or maybe it could even give the user the option to pass a list of categorical and numerical variables.
Also, I've read the papers a while a go, but before drafting this class, it would be good to read the paper on MICE (multivariate imputation of chained equations) and MissForest.
Hi @solegalli,
Yeah, I read a paper on MICE. I saw there that R has a MICE package.
I'm going to table this one for the moment to focus on the other transformers. Maybe one of our wonderful collaborators will pick this one up ;)