onnxmltools
onnxmltools copied to clipboard
Missing H2O Converters
I am currently working on converters from H2O to ONNX for most of the missing model types.
I've seen the Issues #409 and #414 and that some work needs to be done for the GBM models as well.
The current way of the model conversion, as explained in the current GBM converter works by using the H2O MOJO format, which is a zip file with model descriptions and meta data as ini/json files and binaries for the model values themselves.
This MOJO file is being read as an input for the h2o-function print_mojo("path/to/mojo.zip", "json")
.
The Python H2O library maps to this H2O jar file.
This print-function returns a json-string which is used to iterate upon to convert each layer/node/operator to ONNX.
During my research I've encountered 2 major issues with this approach:
- The Export of models to the MOJO format is not (yet) fully implemented, as can be seen in their documentation.
- H2O further provides the models as binaries (I haven't looked into this yet) and POJO (Plain Java Objects), which would require parsing a Java File
- The
print_mojo("path/to/mojo.zip", "json")
-function checks if the model type is tree-based since it is not meant to parse every MOJO model as JSON, but to paint a canvas with the tree structure of certain MOJO types- Distributed Random Forests (DRF) and Isolation Forests (IF) may be implemented using this approach, since they are tree based and thus are supported and can be exported as a json string.
As I see it now I may be limited to implementing DRF and IF models using the current workflow and updating the conversion for GBM models.
I have most likely overlooked features provided by H2O, as it is a huge framework, and would ask for advice if someone else knows of another possible way to parse the models provided by H2O.
Edit: Forgot to mention relevant H2O Jira issues regarding H2O: Implement a convertor from GBM MOJO to ONNX - initial ONNX request, lead to the current implementation in PR Unsupported MOJO type - back-references #409
And further details on the conversation how to implement the model ingest.
Ive opened up a feature request on the H2O Jira board to implement an intermediary JSON repsentation under https://h2oai.atlassian.net/browse/PUBDEV-8690