h2o-3
h2o-3 copied to clipboard
Converting H2O MOJOmodels to Binary
H2O version, Operating System and Environment
h2o versions: 3.28.1.2, 3.42.0.3
Actual behavior
I am am trying to upgrade a binary (DRF) H2O model to a higher version (from v3.28.1.2 to v3.42.0.3). Due to technical restrictions I cannot use the MOJO format for deployment. Instead I am using the following:
- load the binary model in the older version
- export the model as MOJO format
- load the model in MOJO format in the newer version
- save the model in Binary in the newer version
I am able to save the model in the newer version. However, when I try to load the model object (using h2o.load_model()) in a different session (i.e. not straight after saving it to Binary), I get the following error:
Expected behavior
H2OServerError: HTTP 500 Server Error: Server error java.lang.NullPointerException: Error: Caught exception: java.lang.NullPointerException Request: None Stacktrace: java.lang.NullPointerException hex.generic.GenericModel$GenModelSource.backingByteVec(GenericModel.java:391) hex.generic.GenericModel$GenModelSource.get(GenericModel.java:373) hex.generic.GenericModel.genModel(GenericModel.java:325) hex.generic.GenericModel.havePojo(GenericModel.java:546) water.api.schemas3.ModelSchemaV3.fillFromImpl(ModelSchemaV3.java:80) water.api.schemas3.ModelSchemaV3.fillFromImpl(ModelSchemaV3.java:22) water.api.ModelsHandler.importModel(ModelsHandler.java:263) sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
Steps to reproduce
-- converting the model
import h2o import tempfile
from h2o.estimators import H2ORandomForestEstimator, H2OGenericEstimator airlines= h2o.import_file(path=pyunit_utils.locate("smalldata/testng/airlines_train.csv"))
drf = H2ORandomForestEstimator(ntrees=1) drf.train(x=x, y=y, training_frame=airlines)
original_model_filename = tempfile.mkdtemp() original_dir = original_model_filename original_model_filename = drf.download_mojo(original_model_filename)
model = H2OGenericEstimator.from_file(original_model_filename) modelBinary = model.download_model(original_dir)
---- loading the Binary model (this works if it is executed in the same session as when the Binary object was stored), but not if I do it in a different sessions
model2 = h2o.load_model(modelBinary) pred2 = model2.predict(airlines)
Hi @miegielsen. Thanks for reporting this issue.
From our doc:
When saving an H2O binary model with h2o.saveModel (R), h2o.save_model (Python), or in Flow, you will only be able to load and use that saved binary model with the same version of H2O that you used to train your model. H2O binary models are not compatible across H2O versions. If you update your H2O version, then you will need to retrain your model. For production, you can save your model as a POJO/MOJO. These artifacts are not tied to a particular version of H2O because they are just plain Java code and do not require an H2O cluster to be running.
So, you can load the binary model only in the version the model was created in. Then, you can save it as MOJO and load it in a different version. If you need to use only binary models, the version of the h2o should be compatible. So you can save the MOJO model as binary, but this model can only be used in the same version as the binary model was created.
cc: @wendycwong, we should throw an error that says that loading the binary model across versions is impossible.