featuretools
featuretools copied to clipboard
Adding unnamed DataFrames to EntitySets (and have Featuretools generate a name for df)
Adding unnamed DataFrames to EntitySets
Bug/Feature Request Description
Currently, in EvalML's DFSTransformer, we run into issues with un-named dataframes being passed to the DFSTransfomer's .fit() method.
Calling the DFSTransformer.fit() results in the dataframe being added to an EntitySet() via the _make_entity_set() method which uses FT 1.0.0's new add_dataframe() method. add_dataframe() now requires either: 1.) a dataframe that is not woodwork initialized, whereupon it initializes it and names it according to a parameter passed in via add_dataframe() 2.) a dataframe that is woodwork initialized and named.
EvalML currently supports woodwork initialized dataframes that are unnamed.
Expected Output
We'd suggest the following changes:
if dataframe.ww.schema is None:
...
else:
if dataframe.ww.name is None:
if dataframe_name is None:
raise ValueError('Cannot add a Woodwork DataFrame to EntitySet without a name or a proposed name')
else:
dataframe.ww.name = dataframe_name
if dataframe.ww.index is None:
if index is None or not make_index:
raise ValueError('Cannot add Woodwork DataFrame to EntitySet without index')
else:
# do index stuff
Hopefully we could do something similar to accommodate woodwork initialized dataframes and just give them a name, per the add_dataframe() function.
Output of featuretools.show_info()
Featuretools version: 1.0.0
Situations
- User passes WW initialized DF (with no name) and name to EntitySet creation. Featuretools should set the name on the dataframe.
- User passes WW initialized DF (with name) and name to EntitySet creation. Featuretools will throw an error.
Featuretools will need the ability to set the name on an initialized DF
- https://github.com/alteryx/woodwork/issues/1177
I think this issue is broader than just the name. If you look at the related EvalML code they are also potentially passing index and make index values when adding the dataframe to the EntitySet. This will also be problematic with the current implementation. The name error just happens to be the first one that gets raised.
@chukarsten Is something we need to prioritize? Is it negatively affecting EvalML?