TransmogrifAI icon indicating copy to clipboard operation
TransmogrifAI copied to clipboard

Changing imputation for nulls in DateToUnitCircleTransformer

Open michaelweilsalesforce opened this issue 3 years ago • 0 comments

Problem When using DateToUnitCircleTransformer, null dates are replaced with (0,0), which is not on the unit circle. Also with the example of DateToUnitCircleTransformer with TimePeriod HourOfDay, dates with format MM-DD-YYYY are converted to MM-DD-YYYY 00h00m00s, hence will have a circular representation of (1, 0). We would expect the null values being (1, 0) as well.

Solution Using (1, 0) instead of (0, 0) for null default value.

Alternatives Alternatives do not only concern this transformer but the other vectorizer that can return the mode as imputation technique. Instead of getting the mode, randomly select an existing non null value so that the distribution of the feature is not changed. However, this remains difficult :

  • DateToUnitCircleTransformer is not an estimator
  • As an estimator, you would store as a fitted param all the distinct non null values of the dataset.

Additional context This is in the context where we have this HourOfDay circular representation of a MM-DD-YYYY 00h00m00s date not being thrown out by SanityChecker because of Variance being not 0.

michaelweilsalesforce avatar Jun 07 '21 14:06 michaelweilsalesforce