woodwork
woodwork copied to clipboard
ColumnSchema `is_numeric`, `is_datetime`, etc properties hidden and not fully explained
Woodwork ColumnSchema
objects have several properties is_numeric
, is_boolean
, is_categorical
, and is_datetime
that can be used for determining what operations can be performed on a column. Using these will have a slightly different behavior from checking 'numeric' in col.semantic_tags
or isinstance(col.logical_type, Boolean)
, and we don't ever go into detail on what those differences are or why a user should use these properties.
For example:
The benefit that using is_numeric
over checking 'numeric' in col.semantic_tags
is that is_numeric
checks the column's logical type's standard tags for the 'numeric'
tag. So even if a column has had the 'numeric'
tag removed (say it's the index column), we can still know that the underlying data is stored as numbers. This is good for determining if a statistical operation can be performed on a column, but isn't great for when you're trying to understand more about what the user has said the data means (say, in featuretools' primitive matching).
We should add a section on these properties that explains more about when to use them. This might fit nicely in the Working with Types and Tags Guide
or possibly in an FAQ.
Also, since these properties are only on the ColumnSchema, users who are working only with the column accessor may never think to look for them through series.ww._schema.is_numeric
. We should consider exposing them on the column accessor via series.ww.is_numeric
.