machinelearning icon indicating copy to clipboard operation
machinelearning copied to clipboard

NumericColumnNames won't return more than 1 column

Open Phoenix-313 opened this issue 3 years ago • 2 comments

Hi,

Although CategoricalColumnNames returns the correct count of the categorical columns with their correct names, NumericColumnNames on the other hand returns the correct count and column name if the dataset has only one numerical column. However, if the dataset has more than one numerical column, it will always return a count of 1, and the column name will always be "Features" for some reason!

For example, imagine the following dataset:

x1, x2, x3, x4 1, T, 3, A 2, T, 4, A 3, L, 4, A 4, L, 4, B

CategoricalColumnNames will return a count of 2 categorical columns with the names x2 and x4. However, NumericColumnNames will return a count of 1 instead of 2, and one column name which is "Features" instead of x1 and x3.

This is how they are implemented:

ColumnInferenceResults columnInference = MLContext.Auto().InferColumns(TrainingDataPath, labelColumnIndex: 4, hasHeader: true);

ColumnInformation columnInformation = columnInference.ColumnInformation;

ICollection CatCols = columnInformation.CategoricalColumnNames;

ICollection NumCols = columnInformation.NumericColumnNames;

Please help. Thanks.


Document Details

Do not edit this section. It is required for docs.microsoft.com ➟ GitHub issue linking.

Phoenix-313 avatar Apr 20 '22 13:04 Phoenix-313

@LittleLittleCloud any thoughts on this? I am not supe familiar with how AutoML is doing this stuff. Is this going to be fixed/changed by your AutoML changes? Or something I need to look into more?

michaelgsharp avatar May 09 '22 18:05 michaelgsharp

Hi Michael,

It seems that AutoML concatenates all the numeric columns if there are more than one into a single column called "Features". If there is only one numeric column however, it will keep its original name.

If I'm not mistaken, this is not mentioned anywhere in the online documentation. It would be nice if this piece of information is added in the link below to avoid future confusions like mine:

https://docs.microsoft.com/en-us/dotnet/api/microsoft.ml.automl.columninformation.numericcolumnnames?view=ml-dotnet-preview

Phoenix-313 avatar May 10 '22 14:05 Phoenix-313

Per this issue, if you set groupColumns: false it will separate the columns.

beccamc avatar Feb 03 '23 20:02 beccamc