treelite icon indicating copy to clipboard operation
treelite copied to clipboard

Revamp JSON importer to make it easy to use

Open hcho3 opened this issue 1 year ago • 3 comments

hcho3 avatar Aug 14 '23 18:08 hcho3

I am very interested in this issue and understanding what progress there is here.

I see the PR adding JSON importing in the C and python API: https://github.com/dmlc/treelite/pull/448, but it seems like this was since removed. There is still the ability to dump_as_json from any model, but are there any utilities to load these files back? I think my question may be a duplicate of #11 but there seems to have been a lot of development since that issue was closed.

stephenpardy avatar Aug 19 '24 17:08 stephenpardy

@stephenpardy

what progress there is here.

I didn't get around writing the JSON importer yet, because I wasn't sure what kind of interface would be the best for the JSON importer. The last iteration (import_from_json from Treelite 3.9) was clunky to use and had many gotchas. Also, for the JSON importer, it is not as simple as using the output of dump_as_json function, since the output doesn't contain some bits of information that are necessary to preserve the integrity of the model through a round-trip serialization.

Can you describe what your use case would be? I'd like to learn how you plan to use the JSON importer so that I can pick the best design.

hcho3 avatar Aug 21 '24 23:08 hcho3

@hcho3 I am looking for a way to load tree models from a variety of sources - e.g. xgboost, lightGBM, etc. and then save those models in a stable way that can be loaded and served at a later time.

I see the serialize and deserialize methods which seem to meet my needs - and there is even some nice backwards compatibility promised by the docs. I think that is enough for now, but having a human-readable format such as JSON would be much preferred over the binary one if possible (similar to how xgboost now defaults to JSON over the old binary one).

stephenpardy avatar Aug 24 '24 14:08 stephenpardy