featuretools icon indicating copy to clipboard operation
featuretools copied to clipboard

Adding unnamed DataFrames to EntitySets (and have Featuretools generate a name for df)

Open chukarsten opened this issue 3 years ago • 3 comments

Adding unnamed DataFrames to EntitySets


Bug/Feature Request Description

Currently, in EvalML's DFSTransformer, we run into issues with un-named dataframes being passed to the DFSTransfomer's .fit() method.

Calling the DFSTransformer.fit() results in the dataframe being added to an EntitySet() via the _make_entity_set() method which uses FT 1.0.0's new add_dataframe() method. add_dataframe() now requires either: 1.) a dataframe that is not woodwork initialized, whereupon it initializes it and names it according to a parameter passed in via add_dataframe() 2.) a dataframe that is woodwork initialized and named.

EvalML currently supports woodwork initialized dataframes that are unnamed.

Expected Output

We'd suggest the following changes:

        if dataframe.ww.schema is None:
            ...

        else:
            if dataframe.ww.name is None:
                if dataframe_name is None:
                    raise ValueError('Cannot add a Woodwork DataFrame to EntitySet without a name or a proposed name')
                else:
                    dataframe.ww.name = dataframe_name
            if dataframe.ww.index is None:
                if index is None or not make_index:
                    raise ValueError('Cannot add Woodwork DataFrame to EntitySet without index')
                else:
                    # do index stuff

Hopefully we could do something similar to accommodate woodwork initialized dataframes and just give them a name, per the add_dataframe() function.

Output of featuretools.show_info()

Featuretools version: 1.0.0

chukarsten avatar Oct 14 '21 17:10 chukarsten

Situations

  1. User passes WW initialized DF (with no name) and name to EntitySet creation. Featuretools should set the name on the dataframe.
  2. User passes WW initialized DF (with name) and name to EntitySet creation. Featuretools will throw an error.

Featuretools will need the ability to set the name on an initialized DF

  • https://github.com/alteryx/woodwork/issues/1177

gsheni avatar Oct 28 '21 20:10 gsheni

I think this issue is broader than just the name. If you look at the related EvalML code they are also potentially passing index and make index values when adding the dataframe to the EntitySet. This will also be problematic with the current implementation. The name error just happens to be the first one that gets raised.

thehomebrewnerd avatar Oct 28 '21 21:10 thehomebrewnerd

@chukarsten Is something we need to prioritize? Is it negatively affecting EvalML?

gsheni avatar Nov 12 '21 21:11 gsheni