machinelearning icon indicating copy to clipboard operation
machinelearning copied to clipboard

Add support for multi-dimensional arrays for model input/output.

Open jannickj opened this issue 2 years ago • 6 comments

I have a fully working tensorflow model and I litterally just need the last step of having C# run my model, but I am stuck on a null exception.

I have a very simple setup, and I've locked down both sequence length and batch size, however no matter what i do it gives me the exception:

  at Microsoft.ML.Data.TypedCursorable`1.TypedRowBase.<>c__DisplayClass8_0`1.<CreateDirectVBufferSetter>b__0(TRow row)
   at Microsoft.ML.Data.TypedCursorable`1.TypedRowBase.FillValues(TRow row)
   at Microsoft.ML.Data.TypedCursorable`1.RowImplementation.FillValues(TRow row)
   at Microsoft.ML.PredictionEngineBase`2.FillValues(TDst prediction)
   at Microsoft.ML.PredictionEngine`2.Predict(TSrc example, TDst& prediction)
   at MyProject.Model.Run() in 

I have tested that the model works in python and I've made 100% sure the dimensions fit exactly.

public record Features
	{

		[ColumnName("x_1")]
		[VectorType(1, 41, 3)]
		public int[,,] UnigramWindows { get; set; } = null!;
		[ColumnName("x_2")]
		[VectorType(1, 41, 3)]
		public int[,,] BigramWindows { get; set; } = null!;
		[ColumnName("x_3")]
		[VectorType(1, 41, 3)]
		public int[,,] CharTypeWindows { get; set; } = null!;
		[ColumnName("x_4")]
		[VectorType(1, 41, 41)]
		public int[,,] WordsStartingAt { get; set; } = null!;
		[ColumnName("x_5")]
		[VectorType(1, 41, 41)]
		public int[,,] WordsEndingAt { get; set; } = null!;
		[ColumnName("x")]
		[VectorType(1)]
		public int[] SeqLen { get; set; } = null!;
	}

private record Output
{
	[VectorType(1, 41, 6)]
	public float[,,] Identity;
}


private static ITransformer LoadModel(
	MLContext mlContext,
	string modelPath)
{
	var tfModel = mlContext.Model
		.LoadTensorFlowModel(modelPath);
	var schema = tfModel.GetModelSchema();
	var revSchema = schema.Reverse().ToArray();
	var pipeline =
		tfModel
		.ScoreTensorFlowModel(
				outputColumnNames: new[] { "Identity" },
				inputColumnNames:
			 	new[] {
			 		"x",
			 		"x_1",
			 		"x_2",
			 		"x_3",
			 		"x_4",
			 		"x_5",
			 	},
				addBatchDimensionInput: false);



	var dataView = mlContext.Data.LoadFromEnumerable(Enumerable.Empty<Features>());
	ITransformer mlModel = pipeline.Fit(dataView);

	return mlModel;
}

public static run() 
{
        var model = LoadModel(mlContext, "model.pb");
	var predictionEngine = mlContext
		.Model
		.CreatePredictionEngine<Features, Output>(model);

        var res = predictionEngine.Predict(features);

	Console.WriteLine(System.Text.Json.JsonSerializer.Serialize(res));
}

jannickj avatar Jan 31 '22 22:01 jannickj

To any unfortunate soul who've had to deal with the same issue, I finally figured it out. 2d Arrays are not supported in dotnet ml, you're supposed to flatten the arrays yourself ><

jannickj avatar Feb 01 '22 00:02 jannickj

@jannickj sorry for the confusion about this. This is something that we are discussing. As we are working on adding TorchSharp support to ML.NET, this will probably become a larger issue, so we are planning on revisiting this and discussing it again in the future.

michaelgsharp avatar Feb 01 '22 21:02 michaelgsharp

I think a simple fix for now would just be to throw an exception that says multidimensional arrays are not supported the problem is null exception makes figuring it out very obscure.

jannickj avatar Feb 01 '22 21:02 jannickj

Ok so just to clarify. If I load tf model to ml.net that needs as input matrix [,] will I get proper output?

KonradZaremba avatar Jul 26 '23 21:07 KonradZaremba

Any news on that? I'm workint in a POC for a customer. I have a tensortflow model (N,160,6) but Im not able to input an array. How can I do that?

julianogimenez avatar Jan 11 '24 15:01 julianogimenez

@julianogimenez you just have to squeze your multidim array into a single dim array i.e from (N, 160, 6) -> (N * 160 * 6,)

jannickj avatar Jan 24 '24 16:01 jannickj