machinelearning
machinelearning copied to clipboard
VectorType attribute with dynamic dimension while dealing with csv files
I am working on training and evaluation in ML.NET. The data comes from a .csv file. According to the requirements, there may be variable number of columns in the file. Until now, I have achieved this by providing a fixed vectorType dimension. But I am not aware of making this work dynamically.
Here is the a block of code working using the fixed dimension,
public class InputClass { public bool Lavel;
[VectorType(5)] // I want this to be dynamic
public string[] SpecVec;
public string PredictedTarget;
}
public static void Main(string[] args) { MLContext mlContext = new MLContext(seed:0); TextLoader.Columns[] columns = new TextLoader.Columns[3]; columns[0] = new TextLoader.Column("Label", DataKind.Boolean, 0); columns[2] = new TextLoader.Column("PredictedTarget", DataKind.String, 1); columns[1] = new TextLoader.Column("SpecVec", DataKind.String, 2,6);
IDataView dataview = mlContext.Data.LoadFromTextFile(filepath, columns,
separatorChar:',', hasHeader:true, allowQuoting:false, trimWhitespace: true,
allowSparse:true);
//-- creating pipeline and training a model further
} This code works without any issue. But I just need the VectorType to accept the dimension at runtime. Now it accepts only constant values. I am looking if there is any workaround to achieve this. I am a beginner into ML.Net, please do understand if I missed any steps or anything to be explained. I really appreciate any help to get this worked.
@rishi-git when you say dynamically what exactly are you meaning? If you do this:
[VectorType()]
public string[] SpecVec;
It should mark SpecVec as a vector of unknown size. (I think you can even leave off the VectorType and it will just do a vector of unknown size)
For the text loader, you can use the constructor that takes a Range to set it to load from a certain start point until the end of the datafile.
new TextLoader.Column("Features",
DataKind.Single,
new[] { new TextLoader.Range(0, null) }) // This will load everything from column 0 till the end of the file. You can change the start column to whatever you want.
Does something like this work in your situation?
@michaelgsharp Thank you very much for your reply. When I say dynamically, I mean the dimensions for VectorType attribute must be initialized at the runtime. I tried using your code within the program. But it throws an error as "System.ArgumentOutOfRangeException : Schema mismatch for score column 'Score': expected Single, got Vector<Single, 5>". This error occurs at the time of evaluation at the following line of code,
mlContext.BinaryClassification.Evaluate(predictions, labelColumnName : "Label");
I appreciate any further help, thanks.
Hmm.. If thats the case I wonder if we have a bug because just doing [VectorType()] with no parameters should declare it as a vector type, but its clearly not working. Let me take a closer look and see what I can find.
Hey, did anybody resolve that problem? I am also dealing with this...