NimbusML icon indicating copy to clipboard operation
NimbusML copied to clipboard

Accessing metadata about operators in a programatic way

Open mitar opened this issue 4 years ago • 0 comments

Is your feature request related to a problem? Please describe.

We are building an AutoML system. We would like to use NimbusML operators as steps in our pipelines our system produces. It looks like currently this library is targeting mostly manual development of pipelines, targeting compatibility with sklearn. So a lot of information is put into docstrings, for example.

To better understand the rest of the issue, let's clarify terminology. Hyper-parameters are a loaded term, but if we see them as construction-time arguments to the operator, there are generally of three types:

  • Proper hyper-parameters, for example, a constant to the SVM kernel.
  • Control arguments, for example, which column to drop in a transformer.
  • Debug arguments, for example, enable debug logging/printing.

We would like to know how we could access metadata about operators so that they can be used automatically, for example:

  • What proper hyper-parameters do operators take? What types of values they expect? What are ranges of valid values for them?
  • Which of arguments are control arguments, and debug arguments. And what types of values do they take. And what are valid values.
  • How to access human-friendly descriptions for arguments.
  • What is all stored in the state of fitted operator, how is that state (also called parameters) represented, what are keys in that state, and what are types for those keys.
  • How to know if some operator is a model or a transformer? Or more generally, is there any metadata about what type of an operator an operator is. Maybe you have some other categorization of them.
  • How to list all operators available?

Describe the solution you'd like

To my understanding, you have for all ml.NET operators a manifest file describing it, and then from there you generate Python sklearn-compatible classes. I think it would be cool if I could somehow access that manifest file in a programmatic way. Maybe through a class-level property? I am assuming most of answers to questions above can be found there.

Describe alternatives you've considered

I could parse docstrings from sklearn-compatible classes you provide, but that looks like a very fragile approach.

mitar avatar Oct 16 '19 11:10 mitar