TransmogrifAI icon indicating copy to clipboard operation
TransmogrifAI copied to clipboard

Add Integer Feature Type

Open mrunfeldt opened this issue 2 years ago • 4 comments

In conversion from data type to OP type and back, the FeatureSparkTypes class in TMOG converts Int data to Long. This causes failures downstream when the types do not match up. This has become a problem now because we are splitting AutoML into 3 separate, modular stages (DataPrep, FeatEng, Modeling).

Describe the proposed solution Add an Integer feature type. This adds lots of code added to the tmog dependency, but clean logic in modular automl (we don't have to track type conversions).

Describe alternatives you've considered When the DataPrepStage converts Int features to Long, we can just convert them all back to Int. This would require no changes to Tmog (delete this PR) and small code change in automl, but then we will always

Additional context Any insight into why an Integral[Long] feature was used for input data of [Int] would be helpful. My guess is that there was no need to maintain input type when this was all one stage (output is always scores and metrics), so it made sense to use the type with more bits.

mrunfeldt avatar Sep 30 '21 21:09 mrunfeldt

Hello @tovbinm @leahmcguire . We'd love your input on this if you find some time

mrunfeldt avatar Oct 04 '21 18:10 mrunfeldt

The original reason to only have Integral (backed by Long) was to minimize the amount of types we had to manage as it comes with a cost. Since Long can contain Int values, but not vice versa, we thought it would serve us great for both scenarios.

I would like to know more what kind of problems downstream you're experiencing. We can perhaps discuss this in a private conversation?

tovbinm avatar Oct 07 '21 15:10 tovbinm

If you are moving in and out of Tmog it makes sense to add this. When writing code that interacted across pure spark I wrote converters back and forth in the past but that required that you realize that you are going to get this error with type definitions when you move back and forth. This is certainly the cleaner solution. One point is if you have looked through all of the existing types to ensure that this doesn't change (and should not change) the inheritance model.

leahmcguire avatar Oct 07 '21 20:10 leahmcguire

Thank you so very much for your feedback, Leah and Matthew!

Matthew, if you'd like to discuss more, please email me and we can set-up a time for hangout or just talk via email: [email protected]

mrunfeldt avatar Oct 07 '21 20:10 mrunfeldt