evalml icon indicating copy to clipboard operation
evalml copied to clipboard

Understand Imputer Performance

Open chukarsten opened this issue 2 years ago • 0 comments

In https://github.com/alteryx/evalml/pull/3657, we added logic to conditionally add the Imputer only when nulls are detected. This led to a marked performance increase in the datasets that had no missing values. This is confusing, particularly because an Imputer should function as a no-op for datasets with no missing values.

Acceptance Criteria for this story are:

  1. Build a set of 3 datasets with no missing values, 2 datasets with 1-20% missing values, 1 dataset with 20-50% missing values.
  2. Generate performance test baseline showing how just our imputer behaves on these datasets, perhaps iterating over different random splits of the data and hundreds/thousands of loops.
  3. Profile the Imputer and see where the majority of the time is taking. Perhaps functionalize pieces of .fit() or .transform() to try and capture the time being taken on type shifting/inference.

chukarsten avatar Aug 15 '22 20:08 chukarsten