evalml icon indicating copy to clipboard operation
evalml copied to clipboard

Create DataCheck for Unknown types

Open bchen1116 opened this issue 3 years ago • 2 comments

This issue was brought up here as we integrate the new WW update into EvalML. Primarily, we want to raise a datacheck warning/error when the dataset a user passed in has a large number of Unknown-type data since we drop these columns in AutoMLSearch.

bchen1116 avatar Jul 08 '21 22:07 bchen1116

@bchen1116 got it, agreed having a data check which raises a warning if there's a ton of unknown-typed features would be helpful. Not required to support unknown types in evalml though, right?

dsherry avatar Jul 14 '21 19:07 dsherry

We agree it would be helpful to have a data check which alerts users if they're trying to model with any unknown-typed columns. The "unknown" type is intended to designate the case where type inference was unable to determine the most likely type of the column and the user must tell our code what type that column should have.

So, two thoughts

  • Write data check to raise error when one or more provided features are unknown type
  • Explore how we could encode unknown-typed features as all possible types. Example: unknown string type gets represented in automl as both nat lang and categorical, and automl can discern which is most useful

dsherry avatar Oct 19 '21 17:10 dsherry