machinelearning icon indicating copy to clipboard operation
machinelearning copied to clipboard

VectorType attribute with dynamic dimension while dealing with csv files

Open rishi-git opened this issue 3 years ago • 4 comments

I am working on training and evaluation in ML.NET. The data comes from a .csv file. According to the requirements, there may be variable number of columns in the file. Until now, I have achieved this by providing a fixed vectorType dimension. But I am not aware of making this work dynamically.

Here is the a block of code working using the fixed dimension,

public class InputClass { public bool Lavel;

[VectorType(5)] // I want this to be dynamic
public string[] SpecVec;

public string PredictedTarget;

}

public static void Main(string[] args) { MLContext mlContext = new MLContext(seed:0); TextLoader.Columns[] columns = new TextLoader.Columns[3]; columns[0] = new TextLoader.Column("Label", DataKind.Boolean, 0); columns[2] = new TextLoader.Column("PredictedTarget", DataKind.String, 1); columns[1] = new TextLoader.Column("SpecVec", DataKind.String, 2,6);

IDataView dataview = mlContext.Data.LoadFromTextFile(filepath, columns,
separatorChar:',', hasHeader:true, allowQuoting:false, trimWhitespace: true, allowSparse:true);

//-- creating pipeline and training a model further

} This code works without any issue. But I just need the VectorType to accept the dimension at runtime. Now it accepts only constant values. I am looking if there is any workaround to achieve this. I am a beginner into ML.Net, please do understand if I missed any steps or anything to be explained. I really appreciate any help to get this worked.

rishi-git avatar Apr 08 '22 09:04 rishi-git

@rishi-git when you say dynamically what exactly are you meaning? If you do this:

[VectorType()]
public string[] SpecVec;

It should mark SpecVec as a vector of unknown size. (I think you can even leave off the VectorType and it will just do a vector of unknown size)

For the text loader, you can use the constructor that takes a Range to set it to load from a certain start point until the end of the datafile.

new TextLoader.Column("Features",
                                          DataKind.Single,
                                          new[] { new TextLoader.Range(0, null) }) // This will load everything from column 0 till the end of the file. You can change the start column to whatever you want.

Does something like this work in your situation?

michaelgsharp avatar Apr 11 '22 19:04 michaelgsharp

@michaelgsharp Thank you very much for your reply. When I say dynamically, I mean the dimensions for VectorType attribute must be initialized at the runtime. I tried using your code within the program. But it throws an error as "System.ArgumentOutOfRangeException : Schema mismatch for score column 'Score': expected Single, got Vector<Single, 5>". This error occurs at the time of evaluation at the following line of code,

mlContext.BinaryClassification.Evaluate(predictions, labelColumnName : "Label");

I appreciate any further help, thanks.

rishi-git avatar Apr 12 '22 02:04 rishi-git

Hmm.. If thats the case I wonder if we have a bug because just doing [VectorType()] with no parameters should declare it as a vector type, but its clearly not working. Let me take a closer look and see what I can find.

michaelgsharp avatar Apr 26 '22 21:04 michaelgsharp

Hey, did anybody resolve that problem? I am also dealing with this...

mitsuomiyazato avatar Mar 03 '24 21:03 mitsuomiyazato