machinelearning
machinelearning copied to clipboard
NumericColumnNames won't return more than 1 column
Hi,
Although CategoricalColumnNames returns the correct count of the categorical columns with their correct names, NumericColumnNames on the other hand returns the correct count and column name if the dataset has only one numerical column. However, if the dataset has more than one numerical column, it will always return a count of 1, and the column name will always be "Features" for some reason!
For example, imagine the following dataset:
x1, x2, x3, x4 1, T, 3, A 2, T, 4, A 3, L, 4, A 4, L, 4, B
CategoricalColumnNames will return a count of 2 categorical columns with the names x2 and x4. However, NumericColumnNames will return a count of 1 instead of 2, and one column name which is "Features" instead of x1 and x3.
This is how they are implemented:
ColumnInferenceResults columnInference = MLContext.Auto().InferColumns(TrainingDataPath, labelColumnIndex: 4, hasHeader: true);
ColumnInformation columnInformation = columnInference.ColumnInformation;
ICollection
ICollection
Please help. Thanks.
Document Details
⚠ Do not edit this section. It is required for docs.microsoft.com ➟ GitHub issue linking.
- ID: b000cf86-79fe-a677-3b39-f834e1c4b959
- Version Independent ID: af43d324-d6c1-c104-c16b-81580a638de2
- Content: ColumnInformation.NumericColumnNames Property (Microsoft.ML.AutoML)
- Content Source: dotnet/xml/Microsoft.ML.AutoML/ColumnInformation.xml
- Product: dotnet-ml-api
- GitHub Login: @natke
- Microsoft Alias: nakersha
@LittleLittleCloud any thoughts on this? I am not supe familiar with how AutoML is doing this stuff. Is this going to be fixed/changed by your AutoML changes? Or something I need to look into more?
Hi Michael,
It seems that AutoML concatenates all the numeric columns if there are more than one into a single column called "Features". If there is only one numeric column however, it will keep its original name.
If I'm not mistaken, this is not mentioned anywhere in the online documentation. It would be nice if this piece of information is added in the link below to avoid future confusions like mine:
https://docs.microsoft.com/en-us/dotnet/api/microsoft.ml.automl.columninformation.numericcolumnnames?view=ml-dotnet-preview
Per this issue, if you set groupColumns: false it will separate the columns.