TransmogrifAI icon indicating copy to clipboard operation
TransmogrifAI copied to clipboard

Null Imputation for DateUnitCircleVectorizer

Open michaelweilsalesforce opened this issue 3 years ago • 9 comments

Related issues Issue

Describe the proposed solution Using (1, 0) instead of (0, 0) for null default value.

Describe alternatives you've considered Alternatives do not only concern this transformer but the other vectorizer that can return the mode as imputation technique. Instead of getting the mode, randomly select an existing non null value so that the distribution of the feature is not changed. However, this remains difficult :

  • DateToUnitCircleTransformer is not an estimator
  • As an estimator, you would store as a fitted param all the distinct non null values of the dataset.

Additional context This is in the context where we have this HourOfDay circular representation of a MM-DD-YYYY 00h00m00s date not being thrown out by SanityChecker because of Variance not being 0.

michaelweilsalesforce avatar Jun 07 '21 14:06 michaelweilsalesforce

Thanks for the contribution! It looks like @mweilsalesforce is an internal user so signing the CLA is not required. However, we need to confirm this.

salesforce-cla[bot] avatar Jun 07 '21 14:06 salesforce-cla[bot]

Codecov Report

Merging #555 (49425a8) into master (8dc772a) will decrease coverage by 0.00%. The diff coverage is 100.00%.

:exclamation: Current head 49425a8 differs from pull request most recent head 0cab7fb. Consider uploading reports for the commit 0cab7fb to get more accurate results Impacted file tree graph

@@            Coverage Diff             @@
##           master     #555      +/-   ##
==========================================
- Coverage   86.78%   86.78%   -0.01%     
==========================================
  Files         347      347              
  Lines       12026    12024       -2     
  Branches      403      387      -16     
==========================================
- Hits        10437    10435       -2     
  Misses       1589     1589              
Impacted Files Coverage Δ
...s/impl/feature/DateMapToUnitCircleVectorizer.scala 100.00% <100.00%> (ø)
...ges/impl/feature/DateToUnitCircleTransformer.scala 100.00% <100.00%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update f82a301...0cab7fb. Read the comment docs.

codecov[bot] avatar Jun 07 '21 14:06 codecov[bot]

How do I get rid of this CLA stuff?

michaelweilsalesforce avatar Jun 07 '21 18:06 michaelweilsalesforce

@michaelweilsalesforce did you fill out the form ?

TuanNguyen27 avatar Jun 07 '21 21:06 TuanNguyen27

Yes.

michaelweilsalesforce avatar Jun 07 '21 21:06 michaelweilsalesforce

kicking cla bot

anish avatar Jun 15 '21 19:06 anish

@michaelweilsalesforce do you have your salesforce email associated with @mweilsalesforce as well ? Or did you accidentally switch accounts and not realize it ? The cla bot is seeing two authors and it needs both authors to sign cla.

anish avatar Jun 15 '21 19:06 anish

@michaelweilsalesforce do you have your salesforce email associated with @mweilsalesforce as well ? Or did you accidentally switch accounts and not realize it ? The cla bot is seeing two authors and it needs both authors to sign cla.

Both emails are [email protected] and [email protected]

michaelweilsalesforce avatar Jun 15 '21 21:06 michaelweilsalesforce

Somehow I couldn't attach my salesforce account because it was linked to github enterprise. That's why I have created this other account

michaelweilsalesforce avatar Jun 15 '21 21:06 michaelweilsalesforce