machinelearning
machinelearning copied to clipboard
Field with type string cannot be transformed for one hot encoder
System Information (please complete the following information):
- OS & Version: Windows 10
- ML.NET Version: ML.NET v3.0.0-preview.23511.1
- .NET Version: .NET 7.0
Describe the bug
I'm not able to process the data which I'm providing, when I'm using in the model one hot encoder. The string can not be processed.
To Reproduce Steps to reproduce the behavior:
//Define DataViewSchema for data preparation pipeline and trained model
DataViewSchema dataPrepPipelineSchema, modelSchema;
// Load trained model
ITransformer dataPrepPipeline = mlContext.Model.Load("data_preparation_pipeline.zip", out dataPrepPipelineSchema);
ITransformer predictionPipeline = mlContext.Model.Load("model.zip", out modelSchema);
//Load New Data
var newData = DataFrame.LoadCsv("data/input.csv");
// Preprocess Data
IDataView transformedNewData = dataPrepPipeline.Transform(newData);
IDataView predictions = predictionPipeline.Transform(transformedNewData);
Expected behavior Model can load data with type string data_preparation_pipeline.zip model.zip input.csv
Hi @VadimPeczynski,
Is the right column or value? The error says you're trying to load a float value when it's expecting a string. Do you have the actual pipeline available to see how you're building the data prep pipeline?
This issue has been marked needs-author-action and may be missing some important information.
Hi @luisquintanilla,
The code for the transformtaion pipeline looks like this:
var pipelineEstimator =
mlContext.Transforms.ReplaceMissingValues(new[] {
new InputOutputColumnPair("total_bedrooms")
},
MissingValueReplacingEstimator.ReplacementMode.Mode)
.Append(mlContext.Transforms.Categorical.OneHotEncoding(
new[]
{
new InputOutputColumnPair("ocean_proximity")
}, OneHotEncodingEstimator.OutputKind.Indicator));
The data that was attached is one the items from my train set so the format should be compatible with pipeline.
I'm saving the pipeline using this command:
// Save Data Prep transformer
mlContext.Model.Save(pipelineEstimator.Fit(testData), testData.Schema, "data_preparation_pipeline.zip");
Hi @luisquintanilla,
Can you reproduce the issue? Do you need more informations? Is there any fix to it?