feathr icon indicating copy to clipboard operation
feathr copied to clipboard

[FR] Support 'Feature Consumer' Flow within Azure ML

Open cauldnz opened this issue 2 years ago • 0 comments

Willingness to contribute

Yes. I can contribute a fix for this bug independently.

Feature Request Proposal

Allow users to query and get offline features by using drag&drop components within the Azure ML Designer.

Motivation

What is the use case for this feature?

This feature will allow users familiar with the AzureML designer to consume data from Feathr feature store(s) within their workflow. Allowing component steps in the designer flow to read feature metadata from the feature store will allow users to perform data related tasks (data set splits etc.) as they would using the native Data Stores within AzureML.

Details

By implementing this (at least initially) using custom designer components we avoid need to tightly couple Feathr to the AzureML Data Stores and Data Asset capability.

  1. Connect to feature registry
  2. Retrieve available features (metadata)
  3. Graphically select features to be included in query that builds training set
  4. execute query and return training set
  5. Output point of final component is a 'Data output' component output

Specific implementation plan to be documented by @cauldnz below.

  • [ ] Proposed Components:
    • [ ] Feathr Context: Feather Client object, specify the registry and project within that registry.
    • [ ] Feature Query: Reads feature metadata from registry for the selected project, allows users to select which features to include in the feature query. (Modelled around the Select Columns in Dataset OOB component)
    • [ ]
  • [ ] Authentication Approach: Design authentication approach to provide minimal complexity to user. Likely that this needs to support both running in the User Identity and running in the identity of some service principal when code is being executed in compute cluster.

What component(s) does this feature request affect?

  • [X] Python Client: This is the client users use to interact with most of our API. Mostly written in Python.
  • [ ] Computation Engine: The computation engine that execute the actual feature join and generation work. Mostly in Scala and Spark.
  • [ ] Feature Registry API: The frontend API layer supports SQL, Purview(Atlas) as storage. The API layer is in Python(FAST API)
  • [ ] Feature Registry Web UI: The Web UI for feature registry. Written in React

cauldnz avatar Sep 07 '22 19:09 cauldnz