yocto-gl icon indicating copy to clipboard operation
yocto-gl copied to clipboard

[FR] [Roadmap] Add autologging support for CatBoost

Open BenWilson2 opened this issue 2 years ago • 3 comments

MLflow Roadmap Item

This is an MLflow Roadmap item that has been prioritized by the MLflow maintainers. We’ve identified this feature as a highly requested addition to the MLflow package based on community feedback. We're seeking a community contribution for the implementation of this feature and will enthusiastically support the development and review of a submitted PR for this.

Contribution Note

As with other roadmap items, there may be a desire for multiple contributors to work on an issue. While we don’t discourage collaboration, we strongly encourage that a primary contributor is assigned to roadmap issues to simplify the merging process. The items on the roadmap are of a high priority. Due to the wide-spread demand of roadmap features, we encourage potential contributors to only agree to take on the work of creating a PR, making changes, and ensuring that test coverage is adequately created for the feature if they are willing and able to see the implementation through to a merged state.

Feature scope

This roadmap feature’s complexity is classified as:

  • [ ] good-first-issue: This feature is limited in complexity and effort required to implement.
  • [ ] simple: This feature does not require a large amount of effort to implement and / or is clear enough to not need a design discussion with maintainers.
  • [X] involved: This feature will require a substantial amount of development effort but does not require an agreed-upon design from the maintainers. The feedback given during the PR phase may be involved and necessitate multiple iterations before approval. (Please bear with us as we collaborate with you to make a great contribution)
  • [ ] design-recommended: This is a substantial feature that should have a design document approved prior to working on an implementation (to save your time, not ours). After agreeing to work on this feature, a maintainer will be assigned to support you throughout the development process.

Proposal Summary

Implement autologging support for CatBoost. Many officially supported model flavors have autologging implementation (i.e., mlflow.xgboost.autolog(), mlflow.lightgbm.autolog(), and mlflow.sklearn.autolog()). These can be used as a reference for an implementation of autologging implementation for CatBoost.

Note: since callbacks are not supported in CatBoost, metrics can be accessed through log file updates as referenced here

Motivation

  • What is the use case for this feature? Feature parity with other supervised learning package flavors in MLflow and to support easier integration of the CatBoost package with MLflow tracking.
  • Why is this use case valuable to support for MLflow users in general? To simplify the recording of run metrics and parameters for CatBoost.
  • Why is this use case valuable to support for your project(s) or organization? ^
  • Why is it currently difficult to achieve this use case? This feature is not implemented.

What component(s), interfaces, languages, and integrations does this feature affect?

Components

  • [ ] area/artifacts: Artifact stores and artifact logging
  • [ ] area/build: Build and test infrastructure for MLflow
  • [ ] area/docs: MLflow documentation pages
  • [ ] area/examples: Example code
  • [ ] area/model-registry: Model Registry service, APIs, and the fluent client calls for Model Registry
  • [X] area/models: MLmodel format, model serialization/deserialization, flavors
  • [ ] area/projects: MLproject format, project running backends
  • [ ] area/scoring: MLflow Model server, model deployment tools, Spark UDFs
  • [ ] area/server-infra: MLflow Tracking server backend
  • [ ] area/tracking: Tracking Service, tracking client APIs, autologging

Interfaces

  • [ ] area/uiux: Front-end, user experience, plotting, JavaScript, JavaScript dev server
  • [ ] area/docker: Docker use across MLflow's components, such as MLflow Projects and MLflow Models
  • [ ] area/sqlalchemy: Use of SQLAlchemy in the Tracking Service or Model Registry
  • [ ] area/windows: Windows support

Languages

  • [ ] language/r: R APIs and clients
  • [ ] language/java: Java APIs and clients
  • [ ] language/new: Proposals for new client languages

Integrations

  • [ ] integrations/azure: Azure and Azure ML integrations
  • [ ] integrations/sagemaker: SageMaker integrations
  • [ ] integrations/databricks: Databricks integrations

BenWilson2 avatar Jun 16 '22 17:06 BenWilson2

For any questions, concerns, or clarification on implementing this issue, please ping @BenWilson2

BenWilson2 avatar Jun 21 '22 18:06 BenWilson2

@BenWilson2 can I take this one?

rafaelvp-db avatar Jul 27 '22 11:07 rafaelvp-db

@rafaelvp-db Absolutely!

harupy avatar Jul 27 '22 23:07 harupy

Hi @rafaelvp-db how is this going? Would like to start testing some catboost models so wondering if we implement something ad-hoc or wait for this to be available?

mightycommander avatar Oct 04 '22 08:10 mightycommander

Hi @rafaelvp-db , any updates here?

dbczumar avatar Nov 15 '22 07:11 dbczumar

Hi all, I noticed CatBoost started to support callbacks after version 0.26 (as in here). So, I think it should be easy to do something similar to what was did in xgboost case here. WDYT @rafaelvp-db ?

felipeeeantunes avatar Dec 11 '22 15:12 felipeeeantunes

Hi,

Would be happy to contribute and finally to complete the game with the last missing gradient boosting model - CatBoost. Let me know if there is still need of a manpower on this task.

kabartay avatar Dec 20 '22 23:12 kabartay

@kabartay it's all yours! Feel free to tag us on the PR when it's ready for review :)

BenWilson2 avatar Dec 21 '22 17:12 BenWilson2

Hi @kabartay, any updates here?

dbczumar avatar Jan 09 '23 04:01 dbczumar

Hi @dbczumar @BenWilson2,

Thank you very much. Was absent due to relocation and New Year holidays. Starting my work this week and have good time to dedicate to it. I will force first commits by the end of the week.

kabartay avatar Jan 09 '23 20:01 kabartay

@kabartay Thanks so much! That sounds great! Happy New Year!

dbczumar avatar Jan 09 '23 20:01 dbczumar

@dbczumar Happy New Year too!

kabartay avatar Jan 09 '23 20:01 kabartay

Hi @kabartay, did you get a chance to work on the catboost integration?

dbczumar avatar Mar 16 '23 01:03 dbczumar

Hi all, Any plans to pick this up again? Or is all attention going strictly to LLM support?

dbrami avatar Sep 15 '23 19:09 dbrami

Hi @dbrami , we'd be thrilled to review and commit a contribution for this feature. Please let me know if you're interested in working on it!

dbczumar avatar Sep 15 '23 22:09 dbczumar

@dbczumar @dbrami I am sorry for being late, some circumstances and this task were rolling out of my schedule. I pre-started the work, investigated what would be required, started some development. The contribution process is not as fully clear, should we have some roadmap / design first?

kabartay avatar Sep 18 '23 20:09 kabartay