mlflow [FR] Make log_param() return the passed parameter value (expression for functional use)

MLflow Roadmap Item

This is an MLflow Roadmap item that has been prioritized by the MLflow maintainers. We’ve identified this feature as a highly requested addition to the MLflow package based on community feedback. We're seeking a community contribution for the implementation of this feature and will enthusiastically support the development and review of a submitted PR for this.

Feature scope

This roadmap feature’s complexity is classified as:

[X] good-first-issue: This feature is limited in complexity and effort required to implement.

Feature Request

This proposal is to make mlflow.log_param() return the passed parameter value so it can be used in a functional way, as an expression. This will reduce the likelihood of usage errors and increase the speed of development for machine learning practitioners.

Proposal Summary

This feature will allow mlflow.log_param() to return the logged parameter value. This would allow the in-line use of log_param, ensuring that the logged value is actually identical to the value used in the model or process.

Motivation

- What is the use case for this feature?

This feature can be used in machine learning experiments to ensure that the parameter value used is the same as the value that is logged.

- Why is this use case valuable to support for MLflow users in general?

This use case is valuable because it enables rapid, agile development while reducing the likelihood of errors. It's also backwards compatible.

- Why is this use case valuable to support for your project(s) or organization?

This use case supports all data science efforts across our companies' projects.

- Why is it currently difficult to achieve this use case? (please be as specific as possible about why related MLflow features and components are insufficient)

Currently this use case cannot be achieved as mlflow.log_param() is imperative and does not currently return the value.

What component(s), interfaces, languages, and integrations does this feature affect?

Components

[ ] area/artifacts: Artifact stores and artifact logging
[ ] area/build: Build and test infrastructure for MLflow
[ ] area/docs: MLflow documentation pages
[ ] area/examples: Example code
[ ] area/model-registry: Model Registry service, APIs, and the fluent client calls for Model Registry
[ ] area/models: MLmodel format, model serialization/deserialization, flavors
[ ] area/projects: MLproject format, project running backends
[ ] area/scoring: Local serving, model deployment tools, spark UDFs
[ ] area/server-infra: MLflow server, JavaScript dev server
[x] area/tracking: Tracking Service, tracking client APIs, autologging

Interfaces

[ ] area/uiux: Front-end, user experience, JavaScript, plotting
[ ] area/docker: Docker use across MLflow's components, such as MLflow Projects and MLflow Models
[ ] area/sqlalchemy: Use of SQLAlchemy in the Tracking Service or Model Registry
[ ] area/windows: Windows support

Languages

[ ] language/r: R APIs and clients
[ ] language/java: Java APIs and clients
[ ] language/new: Proposals for new client languages

Integrations

[ ] integrations/azure: Azure and Azure ML integrations
[ ] integrations/sagemaker: SageMaker integrations
[ ] integrations/databricks: Databricks integrations

Details

Currently mlflow.log_param() is an imperative statement. Because of this, multiple lines of code are needed to use log_param and parametrize a model. In particular:

A variable must be created and set to the desired value (ideally a constant, which isn't really supported in Python)
The machine learning model is parametrized with that variable's value
mlflow.log_param() is called using that variable's value

For example:

NUM_HIDDEN = 30
model.add(Dense(NUM_HIDDEN))
mlflow.log_param("NUM_HIDDEN", NUM_HIDDEN)  # Current MLflow usage pattern

Several things can go wrong when an implementer uses the current interface:

the implementer might inadvertently use a different variable name from the name used in the log_param call. This could cause confusion in interpreting the logged parameters w.r.t. the source code (e.g., mlflow.log_param("NUM_HIDDEN_UNITS", NUM_HIDDEN)).
the implementer might inadvertently change the value of NUM_HIDDEN between steps 2 and 3. This would result in the wrong value being logged.
the implementer might inadvertently omit or skip step 3, resulting in the model running but no value being logged for that parameter.

If the proposal describe in this issue is adopted, the 3 lines of Python above could be rewritten as:

model.add(Dense(mlflow.log_param("NUM_HIDDEN", 30)))   # Proposed MLflow usage pattern

This would greatly reduce the possibility of errors in the use of MLflow.

May 28 '21 19:05 plaurent

I also think it's pretty helpful to have these params deal with the same pointers, so when params are logged earlier during the code and then changed by frameworks later, the changes get reflected in MLFlow.

Nov 05 '21 20:11 abhinavthomas

@serena-ruan volunteers to do this task, assign it to her. :)

Aug 12 '22 12:08 WeichenXu123

@dbczumar Shall we make MlflowClient.log_param return param value as well ?

Aug 16 '22 11:08 WeichenXu123

As a follow-up, do we need to update log_params as well?

Aug 18 '22 06:08 harupy