impyute icon indicating copy to clipboard operation
impyute copied to clipboard

[DDFG] Complete MNAR missingness generation

Open mm-abogdan opened this issue 4 years ago • 0 comments

Complete mnar method in the Corruptor class.

Simplified, MNAR (Missing Not at Random) is a type of missingness in which the probability of a value being missing is conditional (in whole or in part) on unobserved data. Missingness may be simultaneously conditional on observed data in addition to unobserved data.

Implementation: Generate a random selection of new features and base missingness on these features. The number of features to generate may be based on some fraction of the existing features, or a random number between 1 - n_features. These features could (should?) be a mix of continuous & categorical; this could be based on the fraction of each respective feature type in the existing features. Once generated, impose missingness based on these new features.

Be sure that functions accept & return matrices. Be sure to follow the 4 steps outlined in contributing.md

The below labels are for DDFG (Data Days for Good) participant reference: Priority: High Difficulty: Medium

https://github.com/eltonlaw/impyute/blob/2c25368576558374d385293f65c883a91dff5027/impyute/dataset/corrupt.py#L48-L50

mm-abogdan avatar Jul 08 '19 13:07 mm-abogdan