Potential Issue with the ExcelFormer Example

Open mtreca opened this issue 1 year ago • 1 comments

Hi all!

Stepping through the example script for ExcelFormer, I notice that this line fails with my custom dataset.

AFAIK this seems due to CatToNumTransform adding _{i} strings to the end of categorical feature names, but these names not being changed in the output TensorFrame of the CatToNumTransform. Hence, the mutual_info_sort.transformed_stats being passed to ExcelFormer on line 107 contains _{i} categorical column names while the actual TensorFrame does not.

Case in point, calling this snippet to manually rename statistics to their original name fixes the issue:

fixed_stats = cat_to_num.transformed_stats
for cat_feature in categorical_feature_names:
    stats = fixed_stats.pop(f"{cat_feature}_0")
    fixed_stats[cat_feature] = stats

That fix might not work if the classification task is other than binary though, hence the preferred fix would be for CatToNumTransform to actually rename the column names of the TensorFrames it transforms.

Oct 07 '24 08:10 mtreca

Thanks for reporting! I will take a look

Nov 17 '24 20:11 yiweny