oneDAL icon indicating copy to clipboard operation
oneDAL copied to clipboard

Serialization and Deserialization of Classification Models

Open Mightrider opened this issue 4 years ago • 5 comments

Hi!

The Bug I am playing around with your classifiers and so far I really like it. I started serialization of the trained model to use is later in the prediction. For the SVM it worked out for me, but doing so with Logistic Regression, Gradient Boost or Decision Forest failed to compile. I think I am not able to create the model for the ModelPtr correctly, because the Model is abstract.

Using (training) ResultPtr instead of ModelPtr works and I can get the Model from there, but I would like to avoid the additional overhead. Is there a way to create the ModelPtr correctly for those cases or is it the intended way to use the ResultPtr?

Error Message

invalid new-expression of abstract class type ‘daal::algorithms::logistic_regression::interface1::Model’
     daal::algorithms::logistic_regression::interface1::ModelPtr vModelPtr(new daal::algorithms::logistic_regression::interface1::Model());

To Reproduce

auto vModel = vTrainingResult->get(classifier::training::model);
vModel->serialize(...);

// this compiled and provided the same results compared to the results without serialization
daal::algorithms::multi_class_classifier::interface1::ModelPtr vModelPtr(new daal::algorithms::multi_class_classifier::interface1::Model());

// These three fail to compile
daal::algorithms::logistic_regression::interface1::ModelPtr vModelPtr(new daal::algorithms::logistic_regression::interface1::Model());
daal::algorithms::decision_forest::classification::interface1::ModelPtr vModelPtr(new daal::algorithms::decision_forest::classification::interface1::Model());
daal::algorithms::gbt::classification::interface1::ModelPtr vModelPtr(new daal::algorithms::gbt::classification::interface1::Model());

vModelPtr->deserialize(...);

Environment:

  • OS: [Ubuntu 18.04]
  • Compiler: [GCC7.4]
  • Version: [2019 Update 5]

Mightrider avatar Jan 13 '20 10:01 Mightrider

Those algorithms have wrong model-hiding implementation, which looks like root cause of the issue. @SmirnovEgorRu, could you please assign the issue to somebody to fix it. I could provide some details, if necessary.

averbukh avatar Jan 20 '20 10:01 averbukh

While better solution is in progress, please try
work-around bellow:

training::Batch<float> train(nClasses);

train.input.set(classifier::training::data, testData);

train.getResult()->allocate<float>(train.getInput(),&train.parameter,0);

/* deserialize the Model */
train.getResult()->get(classifier::training::model)->deserialize(out_dataArch);

We have to create algorithm and call allocation of result. Then we can deserialize the model from OutputDataArchive.

ShvetsKS avatar Jan 21 '20 14:01 ShvetsKS

Hello @Mightrider
Do you need further help on this issue? Can we close this issue?

PivovarA avatar Feb 20 '20 15:02 PivovarA

Well I managed to implement a workaround using serialize/deserialize of the result instead of using the model directly. So fixing this issue is no high priority for me but as far as I can tell the problem still exists. Unless it is not intended to serialize/deserialize models at all...

Mightrider avatar Feb 20 '20 16:02 Mightrider

Actually we are still failing to serialize / deserialize certain models. Could this issue be considered for fixing again?

emmenlau avatar Oct 13 '20 09:10 emmenlau

I wanted to check and see if there was any update on this. We've hit this as well and would love to be able to construct models directly for model types such as decision forest.

tmostak avatar Feb 10 '23 19:02 tmostak

@tmostak @emmenlau Appreciate if you can share more context here. Are you looking for specific models beyond decision forest?

Also a question - would you consider moving to oneDAL interfaces if serialization would be implemented there?

napetrov avatar Mar 28 '23 17:03 napetrov

Dear @napetrov thanks for the reply!

For me personally it's ok to move to oneDAL eventually, as I understand this is the future of the library?

If possible it would be great if all models eventually support serialization, because we'd like to store them to disk. Is that a realistic goal?

emmenlau avatar Mar 28 '23 17:03 emmenlau

@napetrov yes ideally all models would support serialization so we can store on disk (we're adding ML training/inference support to our database). However if it helps with prioritization we're currently using the linear regression, random forest, GBT and decision tree models for regression and kmeans and dbscan for clustering.

And yes we'd consider moving to oneDAL proper as soon as we can scope the time to do so, although it would be nice if the daal interfaces still supported serialization of all model types.

tmostak avatar Apr 12 '23 03:04 tmostak

@napetrov Wanted to see if you had any update on this?

tmostak avatar Apr 27 '23 17:04 tmostak

@tmostak , @emmenlau - yes oneDAL have working serialization. And we would look on scope to get DAAL serialization fixed.

napetrov avatar Apr 27 '23 17:04 napetrov