woodwork icon indicating copy to clipboard operation
woodwork copied to clipboard

ColumnSchema `is_numeric`, `is_datetime`, etc properties hidden and not fully explained

Open tamargrey opened this issue 3 years ago • 0 comments

Woodwork ColumnSchema objects have several properties is_numeric, is_boolean, is_categorical, and is_datetime that can be used for determining what operations can be performed on a column. Using these will have a slightly different behavior from checking 'numeric' in col.semantic_tags or isinstance(col.logical_type, Boolean), and we don't ever go into detail on what those differences are or why a user should use these properties.

For example:

The benefit that using is_numeric over checking 'numeric' in col.semantic_tags is that is_numeric checks the column's logical type's standard tags for the 'numeric' tag. So even if a column has had the 'numeric' tag removed (say it's the index column), we can still know that the underlying data is stored as numbers. This is good for determining if a statistical operation can be performed on a column, but isn't great for when you're trying to understand more about what the user has said the data means (say, in featuretools' primitive matching).

We should add a section on these properties that explains more about when to use them. This might fit nicely in the Working with Types and Tags Guide or possibly in an FAQ.

Also, since these properties are only on the ColumnSchema, users who are working only with the column accessor may never think to look for them through series.ww._schema.is_numeric. We should consider exposing them on the column accessor via series.ww.is_numeric.

tamargrey avatar Aug 24 '21 14:08 tamargrey