orange3
orange3 copied to clipboard
Recursive imputer
Use case: We have a model with 200 features. We apply a “Model base imputer (simple tree)” to fill in missing data.
Problem: The “Imputer” widget fills in just a part of the missing data (that's the limit of the default 1-NN regressor used).
Current workaround: We have chained 5 instances of the same imputer, that’s to say, as much as needed to “complete” the imputing procedure for our data. At each stage, more data are imputed, up until the point where the regressor cannot produce further predictions.
Proposed solution: Add a check-box in the Impute widget, to activate iteration. The process will be repeated leveraging the imputed data from the previous iteration. Loop until no more changes are produced.
Note:
- https://scikit-learn.org/stable/modules/impute.html#iterative-imputer
- https://cran.r-project.org/web/packages/mice/mice.pdf.
Although this method is sometimes used in practice, I have reservations about it: with multiple rounds of imputation, you just stack guess upon guess upon guess ...
A bigger problem, though, is that implementing such imputation might be tricky in Orange, which works so that each variable gets its own imputer, while in this method variables are imputed sequentially, with multiple rounds. If having this method would require major changes in the way variables work, I don't think it's worth it.
@markotoplak, would SharedValue
help here? (This question should not be interpreted as my endorsement of implementing this method. :)
I confirm that iterative imputation is giving consistent values and proves by far the best option to interpolate lacking data --> in our specific case.
We discussed this at a core team meeting. The situation you describe shouldn't have happened: models in Orange should never output a missing value. A single iteration of imputation should have imputed all values. Could you upload a minimal example (data + workflow) that reproduces the bug?
This, on the other hand, means that the feature you wish for will not be implemented because it is quite incompatible with Orange's existing architecture.
I totally agree. I prepare the debug case and will be back to you soon.
Ok, here are the files for you to check. Orange - Check 6001.zip