pycytominer icon indicating copy to clipboard operation
pycytominer copied to clipboard

Bug: explicit casting to DataFrame forces incorrect column names

Open shntnu opened this issue 11 months ago • 1 comments

Example code with output

In normalize.normalize, this chunk explicitly recreates a pd.DataFrame from feature_df

    feature_df = pd.DataFrame(
        fitted_scaler.transform(feature_df),
        columns=feature_df.columns,
        index=feature_df.index,
    )

This fails when the fitted_scaler.transform(feature_df).columns are different from feature_df.columns

Issue description

For now, I have a temporary fix by doing this

    feature_df = pd.DataFrame(
        fitted_scaler.transform(feature_df).values,
        columns=feature_df.columns,
        index=feature_df.index,
    )

because .values extracts an ndarray, and that works fine (because the column names are removed when converting to ndarray)

Expected behavior

If the scaler is Spherize and method is PCA/PCA-cor, the column names will not be the same as the original dataframe, so we should adapt this appropriately

Additional information

No response

shntnu avatar Sep 09 '23 20:09 shntnu