machinelearning icon indicating copy to clipboard operation
machinelearning copied to clipboard

Error while using Model.Save()

Open ML-pixel opened this issue 3 years ago • 3 comments
trafficstars

System Information (please complete the following information):

  • Checked on multiple systems (win 10, win 2019,2021 server)

  • ML.NET Version: ML.NET 1.7.0

  • .NET Core 2.2.3

Describe the bug When saving trained model of very large size, xx GB the exception is thrown with the inner exception "The length cannot be greater than the capacity. ParameterName valueCount at System.Text.StringBuilder.Append() at Microsoft.Ml.Data.ReadOnlyMemoryUtils.AppendSpan ...etc What I think is that the Save() uses StringBuilder.Append(char*,Int32) method and the int32 is a problem.

To Reproduce 1.Train extra large model 2.Try to save it with Model.Save()

error_Cut

ML-pixel avatar Aug 10 '22 12:08 ML-pixel

@luisquintanilla It seems like this API is limiting the model size as intended, but there doesn't seem to be a workaround for users who want to save a large model. Is this functionality we need to add to ML.NET, or is there an alternative API I'm missing?

dakersnar avatar Aug 22 '22 20:08 dakersnar

There is no alternative API that I'm aware of.

@ML-pixel do you experience these issues on newer versions of .NET? (i.e. .NET 6)

luisquintanilla avatar Aug 25 '22 17:08 luisquintanilla

There is no alternative API that I'm aware of.

@ML-pixel do you experience these issues on newer versions of .NET? (i.e. .NET 6)

I will check it, but it's not a solution becasue it will make me upgrade production and I'm not sure if its possible right now.

ML-pixel avatar Sep 02 '22 06:09 ML-pixel

@luisquintanilla I've tested it on .net 6 and the issue is still the same. @dakersnar Is there a chance it will be fixed anytime soon?

ML-pixel avatar Sep 30 '22 08:09 ML-pixel

@ML-pixel thanks for confirming this is also the case on .NET 6. Do you know more or less how large your model is?

@dakersnar let's investigate and see what could be causing this. I added it to the ML.NET Future milestone for now. I suspect some of the new deep learning models might run into similar issues.

luisquintanilla avatar Oct 11 '22 14:10 luisquintanilla

@ML-pixel thanks for confirming this is also the case on .NET 6. Do you know more or less how large your model is?

@dakersnar let's investigate and see what could be causing this. I added it to the ML.NET Future milestone for now. I suspect some of the new deep learning models might run into similar issues.

Well, for sure the zip file that was created just before crash was above 25 GB, the RAM consumption is around 400 GB.

ML-pixel avatar Oct 17 '22 09:10 ML-pixel