NimbusML Document the internal JSON representation of the pipelines

Is your feature request related to a problem? Please describe.

As described in the issue #334, we are building an AutoML system in Python. We have our own pipeline representation and would like to use NimbusML operators as operators in our pipelines.

Describe the solution you'd like

To me it looks like the best approach is if you document the JSON representation of pipelines and then that we can directly use that JSON representation. We could then make a converter from our pipeline language to that JSON representation and then execute it by providing you with JSON representation directly.

So, two feature requests:

Document JSON representation.
Provide public API that I can provide JSON representation and run it. Ideally in two phases, fit and predict. Optionally obtaining output values after every operator, but at least the final output value(s) of the pipeline.

Describe alternatives you've considered

One way to do that is to use sklearn-compatibility classes directly, but that then goes through multiple levels of abstraction: we have to convert our pipelnes into sklearn pipelines so that NimbusML can convert it to JSON and then send it over to .NET to run it. I think it is much easier if we can directly provide the JSON.

Oct 16 '19 11:10 mitar

Hi,

Please see the entrypoint graph documentation for ML.NET (https://github.com/dotnet/machinelearning/blob/master/docs/code/EntryPoints.md) and also some brief discussion here (https://github.com/microsoft/NimbusML/blob/341e01ab8d97af2ca8408dacf0b169f6d219d4c0/docs/developers/entrypoints.md) in this repo.

The usual way that I examine this entrypoint graph is to look at the input of px_call (see here https://github.com/microsoft/NimbusML/blob/15f12859273ea0b38bccbf5e7699bfb51c997013/src/python/nimbusml/internal/utils/entrypoints.py#L269).

Hope this helps.

Oct 17 '19 12:10 zyw400

I see. Thanks. So if I create a JSON to represent the pipeline, there is no public API yet to run that? Or am I missing anything?

Oct 17 '19 12:10 mitar

Right. It is not exposed in python yet.

Oct 17 '19 12:10 zyw400

One more question. In this documentation, KMeansPlusPlus is used as an example. What I do not understand is why normalize parameter is defined in docstring and not auto-generated? Other parameters seem to be auto-generated.

Oct 17 '19 13:10 mitar

Most likely we don't like the autogenerated ones and want to add some more details to it. See an example that we keep the auto-generated text for the same parameter: https://github.com/microsoft/NimbusML/blob/master/src/python/nimbusml/ensemble/fastforestbinaryclassifier.py

The content in the docstring is "patched" to the generated python classes by the auto-gen program ( https://github.com/microsoft/NimbusML/blob/46a14e6ddb921a243f269cdc56bc3fda05e13fa1/src/python/tools/entrypoint_compiler.py#L417).

Oct 17 '19 13:10 zyw400

Hm, but why is then normalize parameter not present twice in KMeansPlusPlus?

Oct 17 '19 13:10 mitar

If the parameter description exists in the autogenerated files, the old contents will be updated by the doc string. https://github.com/microsoft/NimbusML/blob/46a14e6ddb921a243f269cdc56bc3fda05e13fa1/src/python/tools/doc_builder.py#L124

Oct 17 '19 13:10 zyw400

I see nice. Thanks for pointing to this.

Oct 17 '19 13:10 mitar

NimbusML NimbusML copied to clipboard

Document the internal JSON representation of the pipelines

NimbusML
NimbusML copied to clipboard