featuretools
featuretools copied to clipboard
allow where_primitives to function independently of agg_primitives
I am using official prediction of customer churn example from here
For quick experimentation, I have added a cell between cel 19
and cell 20
to subset the cutoff_times to include only two msno (IDs). Like so:
cutoff_times_=cutoff_times.iloc[[33,34,21,22],:].reset_index(drop=True)
cutoff_times_ = cutoff_times_.rename(columns={'cutoff_time':'time'})
Then in cell 20, I notice I don't get where clause features made for all set(where_primitives) - set(agg_primitives)
where primitives. I also get warnings.warn(warning_msg, UnusedPrimitiveWarning)
for all the primitives that are there in the where_primitives
list but not in the agg_primitives
list.
Attaching a few examples (I have changed the max_depth to 10 to make sure that insufficient depth is not the cause): 1.
feature_defs,_ = ft.dfs(entityset=es, target_entity='members',
agg_primitives = [],
trans_primitives = ['month'],
cutoff_time_in_index = True,
cutoff_time = cutoff_times_,
where_primitives = ['max'],
max_depth=10, features_only=False)
output:
/home/nitin/miniconda3/envs/featuretools/lib/python3.9/site-packages/featuretools/synthesis/dfs.py:307: UnusedPrimitiveWarning: Some specified primitives were not used during DFS:
where_primitives: ['max']
This may be caused by a using a value of max_depth that is too small, not setting interesting values, or it may indicate no compatible variable types for the primitive were found in the data.
warnings.warn(warning_msg, UnusedPrimitiveWarning)
feature_defs,_ = ft.dfs(entityset=es, target_entity='members',
agg_primitives = ['sum'],
trans_primitives = ['month'],
cutoff_time_in_index = True,
cutoff_time = cutoff_times_,
where_primitives = ['max','min'],
max_depth=10, features_only=False)
output:
/home/nitin/miniconda3/envs/featuretools/lib/python3.9/site-packages/featuretools/synthesis/dfs.py:307: UnusedPrimitiveWarning: Some specified primitives were not used during DFS:
where_primitives: ['max', 'min']
This may be caused by a using a value of max_depth that is too small, not setting interesting values, or it may indicate no compatible variable types for the primitive were found in the data.
warnings.warn(warning_msg, UnusedPrimitiveWarning)
feature_defs,_ = ft.dfs(entityset=es, target_entity='members',
agg_primitives = ['sum','min'],
trans_primitives = ['month'],
cutoff_time_in_index = True,
cutoff_time = cutoff_times_,
where_primitives = ['max','min'],
max_depth=10, features_only=False)
output:
/home/nitin/miniconda3/envs/featuretools/lib/python3.9/site-packages/featuretools/synthesis/dfs.py:307: UnusedPrimitiveWarning: Some specified primitives were not used during DFS:
where_primitives: ['max']
This may be caused by a using a value of max_depth that is too small, not setting interesting values, or it may indicate no compatible variable types for the primitive were found in the data.
warnings.warn(warning_msg, UnusedPrimitiveWarning)
feature_defs,_ = ft.dfs(entityset=es, target_entity='members',
agg_primitives = ['sum','min','max'],
trans_primitives = ['month'],
cutoff_time_in_index = True,
cutoff_time = cutoff_times_,
where_primitives = ['max','min','sum','std'],
max_depth=10, features_only=False)
output:
/home/nitin/miniconda3/envs/featuretools/lib/python3.9/site-packages/featuretools/synthesis/dfs.py:307: UnusedPrimitiveWarning: Some specified primitives were not used during DFS:
where_primitives: ['std']
This may be caused by a using a value of max_depth that is too small, not setting interesting values, or it may indicate no compatible variable types for the primitive were found in the data.
warnings.warn(warning_msg, UnusedPrimitiveWarning)
SYSTEM INFO
python: 3.9.4.final.0 python-bits: 64 OS: Linux OS-release: 5.4.0-74-generic machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: en_IN LOCALE: en_IN.ISO8859-1
INSTALLED VERSIONS
numpy: 1.20.3 pandas: 1.2.4 tqdm: 4.61.1 PyYAML: 5.4.1 cloudpickle: 1.6.0 dask: 2021.6.0 distributed: 2021.6.0 psutil: 5.8.0 pip: 21.1.2 setuptools: 49.6.0.post20210108
Thanks for the question! To avoid getting the warning, the where_primitives
should also be included in agg_primitives
. Interesting values should also be set as done in cell 17 in the notebook example. In cell 20, the parameter agg_primitives
is also not set, so a default set of aggregation primitives get applied during DFS. All the where primitives in that DFS call are included in the default set of aggregation primitives. For reference, here is a quick reproducible example of the warning.
import featuretools as ft
es = ft.demo.load_mock_customer(return_entityset=True)
es['products']['brand'].interesting_values = ['A']
fm, fd = ft.dfs(
entityset=es,
target_entity='sessions',
agg_primitives=[],
trans_primitives=['month'],
where_primitives=['max'],
)
UnusedPrimitiveWarning: Some specified primitives were not used during DFS:
where_primitives: ['max']
This may be caused by a using a value of max_depth that is too small, not setting interesting values, or it may indicate no compatible variable types for the primitive were found in the data.
warnings.warn(warning_msg, UnusedPrimitiveWarning)
When you remove agg_primitives
from the DFS call, the default set of aggregation primitives get applied. The where primitives are also included in the default aggregation primitives, so the warning no longer appears.
fm, fd = ft.dfs(
entityset=es,
target_entity='sessions',
trans_primitives=['month'],
where_primitives=['max'],
)
A full list of the default aggregation primitives are listed in the docstring for featuretools.dfs
:
agg_primitives (list[str or AggregationPrimitive], optional): List of Aggregation
Feature types to apply.
Default: ["sum", "std", "max", "skew", "min", "mean", "count", "percent_true", "num_unique", "mode"]
Let me know if this helps.
Thank you for your help, Jeff.
I figured as much that where_primitives
need to be a subset of agg_primitives
.
Do you also think that this limits the freedom with which featuretools
can be used? I use featuretools
extensively (Everyone related to the creation of featuretools
has my respect, gratitude and love. You guys rock!). I so often come across these scenarios where I want to apply a primitive only along with a certain where clause that I think it would be useful to have this additional dimension of control over primitives application.
Thanks for clarifying! I think that would be a great request. Is this related to #1513?
Yes. That is right. The two issues are:
- No way to control the
where_primitives
application #1513 -
where_primitives
need to be a subset ofagg_primitives
#1514 Hence I opened separate issues