Potential Issue with the ExcelFormer Example
Hi all!
Stepping through the example script for ExcelFormer, I notice that this line fails with my custom dataset.
AFAIK this seems due to CatToNumTransform adding _{i} strings to the end of categorical feature names, but these names not being changed in the output TensorFrame of the CatToNumTransform. Hence, the mutual_info_sort.transformed_stats being passed to ExcelFormer on line 107 contains _{i} categorical column names while the actual TensorFrame does not.
Case in point, calling this snippet to manually rename statistics to their original name fixes the issue:
fixed_stats = cat_to_num.transformed_stats
for cat_feature in categorical_feature_names:
stats = fixed_stats.pop(f"{cat_feature}_0")
fixed_stats[cat_feature] = stats
That fix might not work if the classification task is other than binary though, hence the preferred fix would be for CatToNumTransform to actually rename the column names of the TensorFrames it transforms.
Thanks for reporting! I will take a look