prince icon indicating copy to clipboard operation
prince copied to clipboard

FAMD row_contributions

Open LNS98 opened this issue 4 years ago β€’ 3 comments

I was trying use row_contributions on FAMD to recover the contributions of each individual column on the principal components, but it throws me an error. Wasn't sure if it is because it hasn't been built?

LNS98 avatar Aug 22 '19 13:08 LNS98

Hey! Can you share some code?

MaxHalford avatar Aug 23 '19 17:08 MaxHalford

I sadly don't have the time to make a minimum reproducible example here, but I figured I'd at least note that I also am having the above issue and would flesh it out a bit with some details. The error I get is this:

KeyError: "None of [Index(['outlet_id', 'rain_drizzle', 'prod_txt'], dtype='object')] are in the [columns]"

Of note, those three columns are the only categorical variables in my code; apparently trying to pass them to row_contributions fails. A bit more detail about the example I'm using. The chunk of code that throws this error is here (note the commented conversion of the three categorical columns to strings; I tried that too but it didn't fix this):

import prince

df=df.dropna()
y=df['total_quantity_sold']
X=df[['rain_drizzle','prod_txt','mean_temperature','min_temp','max_temp','precipitation','outlet_id','SATURDAY','SUNDAY','visibility','snow_depth']]
#string_cols = ['rain_drizzle','prod_txt','outlet_id']
#X[string_cols]=X[string_cols].astype(str)

famd = prince.FAMD(
     n_components=X.shape[1],
     n_iter=10,
     copy=True,
     check_input=True,
     engine='auto', 
     random_state=42)
famd=famd.fit(X)

#famd.explained_inertia_ # Note, when uncommented this line works fine
print(famd.row_contributions(X))

Also, a printout of the dtypes of the X dataframe:

rain_drizzle                    category
prod_txt                        category
mean_temperature               float64
min_temp                       float64
max_temp                       float64
precipitation                  float64
outlet_id                       category
SATURDAY                         int64
SUNDAY                           int64
visibility                     float64
snow_depth                     float64
dtype: object

Know that's not a lot more helpful than the above, but hopefully better than a complete lack of an error message like the original report had.

Also quite possible I'm just using this wrong; hard to know with this section of documentation not yet complete though.

tomshaffner avatar Sep 09 '19 14:09 tomshaffner

Same here

X = pd.DataFrame(
    data=[
        ['A', 'A', 'A', 2, 5, 7, 6, 3, 6, 7],
        ['A', 'A', 'A', 4, 4, 4, 2, 4, 4, 3],
        ['B', 'A', 'B', 5, 2, 1, 1, 7, 1, 1],
        ['B', 'A', 'B', 7, 2, 1, 2, 2, 2, 2],
        ['B', 'B', 'B', 3, 5, 6, 5, 2, 6, 6],
        ['B', 'B', 'A', 3, 5, 4, 5, 1, 7, 5]
    ],
   columns=['E1 fruity', 'E1 woody', 'E1 coffee',
             'E2 red fruit', 'E2 roasted', 'E2 vanillin', 'E2 woody',
             'E3 fruity', 'E3 butter', 'E3 woody'],
    index=['Wine {}'.format(i+1) for i in range(6)]
)
X['Oak type'] = [1, 2, 2, 2, 1, 1]


famd = prince.FAMD(
    n_components=2,
    n_iter=3,
    copy=True,
    check_input=True,
    engine='auto',
    random_state=42
)

famd = famd.fit(X.drop('Oak type', axis='columns'))

famd.row_contributions(X)

dansleboby avatar Oct 13 '21 00:10 dansleboby

Hello there πŸ‘‹

I apologise for not answering earlier. I was not maintaining Prince anymore. However, I have just refactored the entire codebase. This refactoring should have fixed many bugs.

I don’t have time and energy to check if this fixes your issue, but there is a good chance it does. Feel free to reopen this issue if the problem persists after installing the new version β€” that is, version 0.8.0 and onwards.

MaxHalford avatar Feb 27 '23 11:02 MaxHalford