open_source_demos
open_source_demos copied to clipboard
create features on one dataset
I have tried to created automated features using only one dataset but it doesnt work. Does it mean I can only use feature tools when I have two or more datasets. The code is as below:
#create entity es = ft.EntitySet(id = 'clients')
#create entity of the dataset es = es.entity_from_dataframe(entity_id = 'app', dataframe = data, index ='customerid')
Default primitives from featuretools
default_agg_primitives = ["sum", "std", "max", "skew", "min", "mean", "count", "percent_true", "num_unique", "mode"] default_trans_primitives = ["day", "year", "month", "weekday", "haversine", "numwords", "characters"]
DFS with specified primitives
feature_matrix, feature_names = ft.dfs(entityset = es, target_entity = 'app', trans_primitives = default_trans_primitives, agg_primitives=default_agg_primitives, max_depth = 2, features_only=False, verbose = True)
print('%d Total Features' % len(feature_names))
This returns same number of features in the dataframe. No new features created
@billy-odera can you provide an example of a feature you would expect to get created using just that one table?
@kmax12 This is the dataframe
customerid age outflows_amout inflows_amount
1 28.00 0 355.00
2 72.00 1 240.00
3 22.00 6 nan
I would expect to get count.outflow_amount, mean,skew etc
@billy-odera not sure i follow your example. if you want to calculate the mean outflows_amount per customer, you would want to create a second entity for your customers that has a relationship to a the entity with multiple rows per customer with different outflow_amounts. let me know if that's helpful or please provide a complete example of what you want to generate so I can better help.
Yes. I encounter the same problem.
Hi @shellwang ,
Can you provide us more details about your goals?
As Max says, you need more related tables to extract this kind of features.
I belefie the issue here is to understand the fundamental of Automatie ML methods, whcih is A transformation acts on a single table (thinking in terms of Python, a table is just a Pandas DataFrame ) by creating new features out of one or more of the existing columns. Like many topics in machine learning, automated feature engineering is a complicated concept built on simple ideas. Using concepts of entitysets, entities, and relationships, featuretools can perform deep feature synthesis to create new features. Deep feature synthesis in turn stacks feature primitives — aggregations, which act across a one-to-many relationship between tables, and transformations, functions applied to one or more columns in a single table — to build new features from multiple tables.
read more with basic example here https://towardsdatascience.com/automated-feature-engineering-in-python-99baf11cc219