evalml
evalml copied to clipboard
Create DataCheck for Unknown types
This issue was brought up here as we integrate the new WW update into EvalML. Primarily, we want to raise a datacheck warning/error when the dataset a user passed in has a large number of Unknown-type data since we drop these columns in AutoMLSearch.
@bchen1116 got it, agreed having a data check which raises a warning if there's a ton of unknown-typed features would be helpful. Not required to support unknown types in evalml though, right?
We agree it would be helpful to have a data check which alerts users if they're trying to model with any unknown-typed columns. The "unknown" type is intended to designate the case where type inference was unable to determine the most likely type of the column and the user must tell our code what type that column should have.
So, two thoughts
- Write data check to raise error when one or more provided features are unknown type
- Explore how we could encode unknown-typed features as all possible types. Example: unknown string type gets represented in automl as both nat lang and categorical, and automl can discern which is most useful