featuretools
featuretools copied to clipboard
Expand guide on using ColumnSchemas in creating custom primitives
While converting Primitives to use Woodwork for their input and return types, there seems to be some common optimizations that can/should be used to define the best input and return types for a primitives.
There isn't much discussion of how a user can ensure that they're following these principles. In the woodwork in featuretools guide, the concept of how ColumnSchemas
get used as input and return types is explained, but the Feature Primitives doc, might be the best place to explain how to best use ColumnSchema
objects.
The tips I can think of right now are:
- Specify the most specific return type as possible. So if you know it'll be an integer, specify that instead of just the
'numeric'
tag - It's better to specify nullable return types, as they won't result in errors during feature matrix calculation
- If specifying a Logical Type in return types, you should also specify any desired standard tags (like
'numeric'
for aDouble
feature) to allow them to be considered for generic numeric inputs (ColumnSchema objects won't implicitly add the'numeric'
tag - that's an accessor behavior)
@tamargrey I agree that an in-depth guide to writing primitives with ColumnSchema will be useful.
We can prioritize this after we release Featuretools v1.0.0
It'd also be useful if this section explained how to define ColumnSchema
objects for the return_types
parameter in dfs. We'd want to avoid users specifying column schemas that are too restrictive or not restrictive enough or redundant, so we'll want to be really clear about how column schemas in return_types
will get used