yocto-gl [FR] Add image artifact to "Comparing 2 Runs" UI

Willingness to contribute

The MLflow Community encourages new feature contributions. Would you or another member of your organization be willing to contribute an implementation of this feature (either as an MLflow Plugin or an enhancement to the MLflow code base)?

[ ] Yes. I can contribute this feature independently.
[X] Yes. I would be willing to contribute this feature with guidance from the MLflow community.
[ ] No. I cannot contribute this feature at this time.

Proposal Summary

When working with images, it is useful to compare two image artifact.

mlflow-fr

Motivation

What is the use case for this feature?

Jane is experimenting with a image filter. Jane uses MLflow to apply the filter to different images and saves the output image as MLflow artifact. Jane visit MLflow Tracking UI and have a side by side view of the the images when comparing two filters/experiments.

Sarah is experimenting with some machine learning to identify particles. Sarah is aware of some edge cases that Sarah must be careful with. Sarah record the edge cases as artifacts and uses MLflow Tracking UI to inspect the improve of the machine learning implementation.
Why is this use case valuable to support for MLflow users in general?

This feature will benefit MLflow users that work with images.
Why is this use case valuable to support for your project(s) or organization?

This feature will be time saving for us.
Why is it currently difficult to achieve this use case? (please be as specific as possible about why related MLflow features and components are insufficient)

MLflow "Comparing 2 Runs" Tracking UI doesn't list artifacts.

What component(s), interfaces, languages, and integrations does this feature affect?

Components

[ ] area/artifacts: Artifact stores and artifact logging
[ ] area/build: Build and test infrastructure for MLflow
[ ] area/docs: MLflow documentation pages
[ ] area/examples: Example code
[ ] area/model-registry: Model Registry service, APIs, and the fluent client calls for Model Registry
[ ] area/models: MLmodel format, model serialization/deserialization, flavors
[ ] area/projects: MLproject format, project running backends
[ ] area/scoring: Local serving, model deployment tools, spark UDFs
[ ] area/server-infra: MLflow server, JavaScript dev server
[X] area/tracking: Tracking Service, tracking client APIs, autologging

Interfaces

[X] area/uiux: Front-end, user experience, JavaScript, plotting
[ ] area/docker: Docker use across MLflow's components, such as MLflow Projects and MLflow Models
[ ] area/sqlalchemy: Use of SQLAlchemy in the Tracking Service or Model Registry
[ ] area/windows: Windows support

Languages

[ ] language/r: R APIs and clients
[ ] language/java: Java APIs and clients
[ ] language/new: Proposals for new client languages

Integrations

[ ] integrations/azure: Azure and Azure ML integrations
[ ] integrations/sagemaker: SageMaker integrations
[ ] integrations/databricks: Databricks integrations

Dec 15 '20 04:12 rgaiacs

@rgaiacs Thanks for filing this request. Currently, you can compare runs' parameters and metrics, along with schema signatures. Model Registry allows you to compare model versions' MLflow entities. Comparing artifacts is an interesting idea. One can envision comparing two SHAP images, or as you point out comparing images across experiment runs when they are a couple of images that went through a set of filters and max pooling. The problem may arise: which ones do you compare when you have a large batch of images, each undergoing a convulsion? And which ones do you display to compare as there can be hundreds of images?

cc: @sueann @dbczumar @smurching @AveshCSingh

Dec 16 '20 18:12 dmatrix

@dmatrix Thanks for the feedback.

Regarding the number of images, I think this feature can initially be limited to a maximum number of artifacts per experiment (let say 10 images for now) to avoid performance issues on both server and client side. When Joe is writing their script, Joe can choose 10 images based on their knowledge.

I believe this feature is useful during the exploration and debugging phase of a project and not practical during the benchmark phase. For example, let say that Kat has 1000 images of cats and 1000 images of dogs to train a machine learning model that recognise cats. Kat does the training of the model and now need to test it. Kat manually selects 10 images edge cases (image with more than two cats, image with one cat and one dog, ...) to be displayed/compared by MLflow. Kat discovers that the model doesn't recognise kids draw of cats and Kat add it to the issue track. When Kat is performing benchmark against competitors models, no image is displayed for comparison.

Dec 17 '20 02:12 rgaiacs

Hi, The ability of comparing artifacts from multiple runs is very interesting. In our team, we need to compare observed-vs-fitted or lift charts from multiple runs. Another example is comparing the feature importance graphs or SHAP results for two runs.

Is there any plan for developing this feature in Mlflow? Thanks.

Mar 18 '21 01:03 Lucas-bayati

Really interested in this feature too. Our project has a lot of object detection models that run against multiple test sets/edge cases (lets say each test set has 25 images). It would be good to compare these test runs (form one model) side-by-side along with the images. Or even multiple models on same test set along with images to troubleshoot qualitatively faster.

Currently we manually have to do this and it takes several days of back and forth iterations :( .

Thanks !

Apr 13 '21 17:04 mohammedayub44

Interested in this feature as well to compare PPM/Recall curves of models trained with different data. Thx

May 11 '21 16:05 intelligentaudit

Our team is very interested in this feature being enabled in mlflow - is there a plan to get some momentum on its development?

Oct 13 '21 15:10 R0ll1ngSt0ne

This feature is available on neptune.ai and seems to work quite well there. I don’t see a reason to stop with images either — for example, I use Vega lite to generate interactive plots, and being able to view them side by side would be a big help.

May 17 '22 12:05 semperigrinus

any updates on this? I mean there are so many options to turn something you want to compare into visual data. Thus this contribution would essential since with it you could compare any type of plot artifact, output image or other visual representation of the model's performance. So I am not sure why this is not on the list of very important features.

Jul 13 '22 15:07 benelot

Very interesting :-)

Oct 14 '22 09:10 thomasfarrierGjensidige

I'd love to have that. I work with time series forecasting and it's always important to look at previous versions' plots to compare between each other. Any developments of this? Thank you!

Dec 14 '22 10:12 susanameiras

Any updates on this feature?

May 19 '23 07:05 floriancircly

I'm also interested in this feature. One way to go about this is to have a table of images, where rows are aligned using the figure name and columns are runs. The user must select which figures to display, similar to the existing parameters and metrics drop down menus.

Jul 26 '23 00:07 trianta2

yocto-gl yocto-gl copied to clipboard

[FR] Add image artifact to "Comparing 2 Runs" UI

Willingness to contribute

Proposal Summary

Motivation

What component(s), interfaces, languages, and integrations does this feature affect?

yocto-gl
yocto-gl copied to clipboard