open_source_demos icon indicating copy to clipboard operation
open_source_demos copied to clipboard

create features on one dataset

Open billy-odera opened this issue 6 years ago • 6 comments

I have tried to created automated features using only one dataset but it doesnt work. Does it mean I can only use feature tools when I have two or more datasets. The code is as below:

#create entity es = ft.EntitySet(id = 'clients')

#create entity of the dataset es = es.entity_from_dataframe(entity_id = 'app', dataframe = data, index ='customerid')

Default primitives from featuretools

default_agg_primitives = ["sum", "std", "max", "skew", "min", "mean", "count", "percent_true", "num_unique", "mode"] default_trans_primitives = ["day", "year", "month", "weekday", "haversine", "numwords", "characters"]

DFS with specified primitives

feature_matrix, feature_names = ft.dfs(entityset = es, target_entity = 'app', trans_primitives = default_trans_primitives, agg_primitives=default_agg_primitives, max_depth = 2, features_only=False, verbose = True)

print('%d Total Features' % len(feature_names))

This returns same number of features in the dataframe. No new features created

billy-odera avatar Dec 07 '18 06:12 billy-odera

@billy-odera can you provide an example of a feature you would expect to get created using just that one table?

kmax12 avatar Dec 07 '18 15:12 kmax12

@kmax12 This is the dataframe

customerid  age	   outflows_amout  inflows_amount
1	            28.00	                 0                  355.00	
2	            72.00	                 1	             240.00	
3	            22.00	                 6	              nan

I would expect to get count.outflow_amount, mean,skew etc

billy-odera avatar Dec 09 '18 14:12 billy-odera

@billy-odera not sure i follow your example. if you want to calculate the mean outflows_amount per customer, you would want to create a second entity for your customers that has a relationship to a the entity with multiple rows per customer with different outflow_amounts. let me know if that's helpful or please provide a complete example of what you want to generate so I can better help.

kmax12 avatar Dec 09 '18 20:12 kmax12

Yes. I encounter the same problem.

shellwang avatar Feb 02 '19 16:02 shellwang

Hi @shellwang ,

Can you provide us more details about your goals?

As Max says, you need more related tables to extract this kind of features.

bukosabino avatar Feb 02 '19 18:02 bukosabino

I belefie the issue here is to understand the fundamental of Automatie ML methods, whcih is A transformation acts on a single table (thinking in terms of Python, a table is just a Pandas DataFrame ) by creating new features out of one or more of the existing columns. Like many topics in machine learning, automated feature engineering is a complicated concept built on simple ideas. Using concepts of entitysets, entities, and relationships, featuretools can perform deep feature synthesis to create new features. Deep feature synthesis in turn stacks feature primitives — aggregations, which act across a one-to-many relationship between tables, and transformations, functions applied to one or more columns in a single table — to build new features from multiple tables.

read more with basic example here https://towardsdatascience.com/automated-feature-engineering-in-python-99baf11cc219

turkialjrees avatar Feb 03 '19 00:02 turkialjrees