yocto-gl icon indicating copy to clipboard operation
yocto-gl copied to clipboard

[FR] [Roadmap] Support 'OR' filtering in search

Open BenWilson2 opened this issue 2 years ago • 1 comments

MLflow Roadmap Item

This is an MLflow Roadmap item that has been prioritized by the MLflow maintainers. We’ve identified this feature as a highly requested addition to the MLflow package based on community feedback. We're seeking a community contribution for the implementation of this feature and will enthusiastically support the development and review of a submitted PR for this.

Contribution Note

As with other roadmap items, there may be a desire for multiple contributors to work on an issue. While we don’t discourage collaboration, we strongly encourage that a primary contributor is assigned to roadmap issues to simplify the merging process. The items on the roadmap are of a high priority. Due to the wide-spread demand of roadmap features, we encourage potential contributors to only agree to take on the work of creating a PR, making changes, and ensuring that test coverage is adequately created for the feature if they are willing and able to see the implementation through to a merged state.

Feature scope

This roadmap feature’s complexity is classified as:

  • [ ] good-first-issue: This feature is limited in complexity and effort required to implement.
  • [ ] simple: This feature does not require a large amount of effort to implement and / or is clear enough to not need a design discussion with maintainers.
  • [ ] involved: This feature will require a substantial amount of development effort but does not require an agreed-upon design from the maintainers. The feedback given during the PR phase may be involved and necessitate multiple iterations before approval. (Please bear with us as we collaborate with you to make a great contribution)
  • [X] design-recommended: This is a substantial feature that should have a design document approved prior to working on an implementation (to save your time, not ours). After agreeing to work on this feature, a maintainer will be assigned to support you throughout the development process.

Proposal Summary

Implement the ability within the MLflow UI to conduct searches with an 'OR' syntax (currently 'AND' is only supported). This implementation will need to function for both FileStore and SQLAlchemyStore.

Note: Due to the potential complexity, it is strongly advised that an implementation design document (RFC) is submitted so that a maintainer can assist you in minimizing potential rework and frustration on your part.

Duplicates: #4135 #6069

Motivation

What is the use case for this feature?

To support filtering on multiple conditions within the UI search functionality.

Why is this use case valuable to support for MLflow users in general?

This has been requested many times in the past.

Why is this use case valuable to support for your project(s) or organization? ^

Why is it currently difficult to achieve this use case?

The workaround to this is to extract data from MLflow and perform manual filtering to generate subsets of runs that meet custom search criteria.

What component(s), interfaces, languages, and integrations does this feature affect?

Components

  • [ ] area/artifacts: Artifact stores and artifact logging
  • [ ] area/build: Build and test infrastructure for MLflow
  • [ ] area/docs: MLflow documentation pages
  • [ ] area/examples: Example code
  • [ ] area/model-registry: Model Registry service, APIs, and the fluent client calls for Model Registry
  • [ ] area/models: MLmodel format, model serialization/deserialization, flavors
  • [ ] area/projects: MLproject format, project running backends
  • [ ] area/scoring: MLflow Model server, model deployment tools, Spark UDFs
  • [X] area/server-infra: MLflow Tracking server backend
  • [X] area/tracking: Tracking Service, tracking client APIs, autologging

Interfaces

  • [X] area/uiux: Front-end, user experience, plotting, JavaScript, JavaScript dev server
  • [ ] area/docker: Docker use across MLflow's components, such as MLflow Projects and MLflow Models
  • [X] area/sqlalchemy: Use of SQLAlchemy in the Tracking Service or Model Registry
  • [ ] area/windows: Windows support

Languages

  • [ ] language/r: R APIs and clients
  • [ ] language/java: Java APIs and clients
  • [ ] language/new: Proposals for new client languages

Integrations

  • [ ] integrations/azure: Azure and Azure ML integrations
  • [ ] integrations/sagemaker: SageMaker integrations
  • [ ] integrations/databricks: Databricks integrations

BenWilson2 avatar Jun 16 '22 18:06 BenWilson2

For any questions, concerns, or clarification on implementing this issue, please ping @dbczumar

BenWilson2 avatar Jun 21 '22 18:06 BenWilson2

@dbczumar is this still up for grabs?

I’ll be down to start brainstorming.

couple thoughts

  1. If mlflow supports OR operator, it would also have to support parenthesis operator generally to avoid ambiguity (I know it’s already supported for the IN operator)
  2. As a first idea taking inspiration from this https://julien.danjou.info/simple-filtering-syntax-tree-in-python/ and the existing operator mapping for the sql databases it can prob be generated to work recursively to build filterer. Each store would override the operators as just functions mapped in a dict like they are now.

AndersonReyes avatar Jan 16 '23 06:01 AndersonReyes

Hi @AndersonReyes , absolutely! This is still up for grabs! Welcome back :)

I agree that supporting parentheses is critical here, and I think recursive processing of the filter makes a lot of sense. I wonder how well sqlparse (used here: https://github.com/mlflow/mlflow/blob/7101d571942cd29b3358334599e6ed3e2e248d0f/mlflow/utils/search_utils.py#L1199) supports this natively. Would you be up for building out the search filter parsing to produce the operator map? From there, I think you're spot on that translating the operator mapping into either a collection of Python comparators (for FileStore) or SQL filter clauses (SQLAlchemyStore) should do it.

dbczumar avatar Jan 16 '23 07:01 dbczumar

Hi @AndersonReyes , absolutely! This is still up for grabs! Welcome back :)

I agree that supporting parentheses is critical here, and I think recursive processing of the filter makes a lot of sense. I wonder how well sqlparse (used here:

https://github.com/mlflow/mlflow/blob/7101d571942cd29b3358334599e6ed3e2e248d0f/mlflow/utils/search_utils.py#L1199

) supports this natively. Would you be up for building out the search filter parsing to produce the operator map? From there, I think you're spot on that translating the operator mapping into either a collection of Python comparators (for FileStore) or SQL filter clauses (SQLAlchemyStore) should do it.

Yeah I’ll be up for it!

Yeah looking at the py parsed docs look like the return a tree like structure of tokens and token list for hierarchy. https://sqlparse.readthedocs.io/en/latest/analyzing/#analyzing-the-parsed-statement So I’ll start there first see how far I can get trying to setup a quick Proof of concept.

alternatively there is https://github.com/tobymao/sqlglot#sql-execution but seems experimental and would require a new library but I’ll leave that for after trying the above.

AndersonReyes avatar Jan 16 '23 16:01 AndersonReyes

got little hack working here https://github.com/AndersonReyes/mlflow/blob/FR-6075/mlflow/utils/search_utils.py#L1218 to output a list of comparators including for OR, And, and nested parenthesis.

next step is to use this to build file store matcher/filterer and for sql alchemy mapping the comparator to native func recursively. We can chat design options after hack to have something concrete. there might be options to explore.

AndersonReyes avatar Jan 19 '23 06:01 AndersonReyes

got little hack working here https://github.com/AndersonReyes/mlflow/blob/FR-6075/mlflow/utils/search_utils.py#L1218 to output a list of comparators including for OR, And, and nested parenthesis.

next step is to use this to build file store matcher/filterer and for sql alchemy mapping the comparator to native func recursively. We can chat design options after hack to have something concrete. there might be options to explore.

This is awesome! Thanks so much, @AndersonReyes ! :D. Sounds great!

dbczumar avatar Jan 19 '23 06:01 dbczumar

@dbczumar Actually was able to just use the existing code patterns to support Or clause and parenthesis with recursion. https://github.com/mlflow/mlflow/compare/master...AndersonReyes:mlflow:FR-6075

lmk what you think. Added tests to make sure changes work without breaking existing code. Some tests still fail in other places like the rest store etc which im trying to debug but this seems like the lowest effort approach.

For example by using new types in the dicts here when buildin the comparators was able to allow recursive evaluation for conditionals and parenthesis in the matching functions here

I've only modifed code to support it for run searching. Looking at the experiment, model, and model version search utils I'm pretty confident the same pattern works in those search utils also given the classes are structured with the same patterns as SearchUtils.

Lmk how we can proceed:

  1. Are there any other ideas we can try?
  2. For the design doc is there a template databricks/mlflow maintainers use? I can get started describing this approach at least in a doc.

AndersonReyes avatar Jan 22 '23 17:01 AndersonReyes

Hi @AndersonReyes here's the design doc template that we use for OSS MLFlow: https://docs.google.com/document/d/1AQGgJk-hTkUo0lTkGqCGQOMelQmz05kQz_OA4bJWaJE/edit

BenWilson2 avatar Jan 23 '23 14:01 BenWilson2

Hi @AndersonReyes , your approach from https://github.com/mlflow/mlflow/issues/6075#issuecomment-1399551648 sounds great to me. Is this still something you're able to contribute?

dbczumar avatar Feb 15 '23 00:02 dbczumar

Hi @AndersonReyes , your approach from #6075 (comment) sounds great to me. Is this still something you're able to contribute?

Hey sorry I've been unresponsive, (got a little weird with recent layoffs at work) but I need to do the RFC writeup so the changes are documented

AndersonReyes avatar Feb 21 '23 18:02 AndersonReyes

Hi @AndersonReyes , your approach from #6075 (comment) sounds great to me. Is this still something you're able to contribute?

Hey sorry I've been unresponsive, (got a little weird with recent layoffs at work) but I need to do the RFC writeup so the changes are documented

Hope all is well! Sounds great :). Thank you for looking into this!

dbczumar avatar Feb 21 '23 18:02 dbczumar

@dbczumar https://github.com/mlflow/mlflow/pull/7972

AndersonReyes avatar Mar 06 '23 14:03 AndersonReyes