tribuo icon indicating copy to clipboard operation
tribuo copied to clipboard

About Models Serialization

Open weicongs-amazon opened this issue 4 years ago • 2 comments

Ask the question

I am exploring to use tribuo in our project. In our project, we need save the ML models for the future uses. I see that the models implements java.io.Serializable interface. I think this is only supported way for serialization, right? With this way, I just have concerns that the future upgrading will be very hard. For instance, the schema/interface of some algorithms are changed. Is there any advice to address this concern? Thanks a lot for the help

weicongs-amazon avatar Mar 10 '21 00:03 weicongs-amazon

We discuss our compatibility guidance in the FAQ. Basically we guarantee that serialized models will be upwards compatible within the major version (e.g. a model trained in 4.0 will work with any release in the 4.x series, and a model trained in 4.1 will work in 4.1 or greater until we hit v5). We currently exclude the TensorFlow models from this as we're transitioning from the TF 1.x releases to the new tensorflow-java releases and TF-Java only just started to support TF's usual serialization mechanisms. Hopefully as TF-Java stabilises we'll be able to stabilise Tribuo's TF based models to provide the same upwards compatibility guarantee.

We take the serialisation compatibility seriously, for example in the upcoming 4.1 release there is a complete refactor of LinearSGDModel and LinearSGDTrainer to allow them to use optimised dense math functions, but LinearSGDModels from 4.0 will load fine in 4.1 (however if they are then saved out by Tribuo 4.1 then they won't load in Tribuo 4.0, as we only provide upwards compatibility). If a model from a 4.x release won't load in a later 4.y release then that's a bug and we'll fix it.

We're looking at alternative serialisation mechanisms to replace java.io.Serializable as mentioned in the roadmap. I think we're leaning towards a protobuf based mechanism, but it's not decided yet. We're also looking at adding ONNX export support for Tribuo models. Both of these are likely to land in the 4.2 release, and we'll support converting back and forth between the new mechanism and java.io.Serializable (though once a model has been exported as ONNX it will only load back in using Tribuo's ONNX support, it won't reconstruct the original Tribuo model class). Once we've adopted new serialisation mechanisms we hope to provide upwards compatibility across major versions.

Craigacp avatar Mar 10 '21 02:03 Craigacp

We've started landing ONNX export support in preparation for the 4.2 release - https://github.com/oracle/tribuo/pull/154. Initially this will cover a subset of Tribuo's models, and probably not support arbitrary recursion over ensembles (i.e., an ensemble made up of ensembles). It will also have the start of an alternative serialization mechanism, as you'll be able to serialize provenance objects into protobufs. This will also expand in future releases to allow serialization of all of Tribuo's models as protobufs.

Craigacp avatar Aug 30 '21 17:08 Craigacp

As of 4.3 all models can be serialized into protobufs and we'll use that as the sole serialization mechanism for v5 replacing java.io.Serializable (ONNX export will still be available for a subset of Tribuo models).

Craigacp avatar Jul 25 '23 19:07 Craigacp