featuretools icon indicating copy to clipboard operation
featuretools copied to clipboard

Expand guide on using ColumnSchemas in creating custom primitives

Open tamargrey opened this issue 3 years ago • 2 comments

While converting Primitives to use Woodwork for their input and return types, there seems to be some common optimizations that can/should be used to define the best input and return types for a primitives.

There isn't much discussion of how a user can ensure that they're following these principles. In the woodwork in featuretools guide, the concept of how ColumnSchemas get used as input and return types is explained, but the Feature Primitives doc, might be the best place to explain how to best use ColumnSchema objects.

The tips I can think of right now are:

  • Specify the most specific return type as possible. So if you know it'll be an integer, specify that instead of just the 'numeric' tag
  • It's better to specify nullable return types, as they won't result in errors during feature matrix calculation
  • If specifying a Logical Type in return types, you should also specify any desired standard tags (like 'numeric' for a Double feature) to allow them to be considered for generic numeric inputs (ColumnSchema objects won't implicitly add the 'numeric' tag - that's an accessor behavior)

tamargrey avatar Aug 23 '21 16:08 tamargrey

@tamargrey I agree that an in-depth guide to writing primitives with ColumnSchema will be useful.

We can prioritize this after we release Featuretools v1.0.0

gsheni avatar Aug 23 '21 19:08 gsheni

It'd also be useful if this section explained how to define ColumnSchema objects for the return_types parameter in dfs. We'd want to avoid users specifying column schemas that are too restrictive or not restrictive enough or redundant, so we'll want to be really clear about how column schemas in return_types will get used

tamargrey avatar Sep 20 '21 14:09 tamargrey