yocto-gl [FR] `log_table` support for csv (and not only json)

Willingness to contribute

Yes. I can contribute this feature independently.

Proposal Summary

The experimental log_table feature is great! However, it supports only json. I think supporting also csv would be a benefit for the user and easy to implement.

Motivation

What is the use case for this feature?

Log data as csv (instead of only json)

Why is this use case valuable to support for MLflow users in general?

It's easier to view, as mlflow already renders csv as html tables nicely in the browser (and json as plain jsons)
More flexibility like e.g, log_dict supports json and yml.

Why is this use case valuable to support for your project(s) or organization?

Why is it currently difficult to achieve this use case?

not really, mlflow.log_text(df.to_csv(), artifact_file="example.csv") would to the job. However, it is not intuitive as there is a log_table function

Details

i would make use of the existing pandas.to_csv function
i would decide if csv or json from the provided artifact_file string: If csv=csv, else json (likewise its done in log_dict for json/yaml)

What component(s) does this bug affect?

[X] area/artifacts: Artifact stores and artifact logging
[ ] area/build: Build and test infrastructure for MLflow
[ ] area/deployments: MLflow Deployments client APIs, server, and third-party Deployments integrations
[ ] area/docs: MLflow documentation pages
[ ] area/examples: Example code
[ ] area/model-registry: Model Registry service, APIs, and the fluent client calls for Model Registry
[ ] area/models: MLmodel format, model serialization/deserialization, flavors
[ ] area/recipes: Recipes, Recipe APIs, Recipe configs, Recipe Templates
[ ] area/projects: MLproject format, project running backends
[ ] area/scoring: MLflow Model server, model deployment tools, Spark UDFs
[ ] area/server-infra: MLflow Tracking server backend
[ ] area/tracking: Tracking Service, tracking client APIs, autologging

What interface(s) does this bug affect?

[X] area/uiux: Front-end, user experience, plotting, JavaScript, JavaScript dev server
[ ] area/docker: Docker use across MLflow's components, such as MLflow Projects and MLflow Models
[ ] area/sqlalchemy: Use of SQLAlchemy in the Tracking Service or Model Registry
[ ] area/windows: Windows support

What language(s) does this bug affect?

[ ] language/r: R APIs and clients
[ ] language/java: Java APIs and clients
[ ] language/new: Proposals for new client languages

What integration(s) does this bug affect?

[ ] integrations/azure: Azure and Azure ML integrations
[ ] integrations/sagemaker: SageMaker integrations
[ ] integrations/databricks: Databricks integrations

Apr 25 '24 14:04 turbotimon

This feature makes sense ! :)

Apr 26 '24 00:04 WeichenXu123

The function in its current form seems to do much more than one would expect and has some inconsistencies:

It supports images in table and does some "magic" with it like saving them to a pre-defined folder name in two sizes. This is neither documented nor do i see a broad use case for that..
The corresponding load_table does not have this functionality to load images..

May 01 '24 09:05 turbotimon

I ran into another issue with log_table: unlike any other of the log_artifact functions, it uses a pattern in which it appends the artifact path to an mlflow tag mlflow.loggedArtifacts (see here). Since mlflow tags have a max length of 5000, this creates an artificial limit to the number of artifacts that can be saved per run. Basically, the summed total length of all table artifact paths cannot exceed an mlflow tag's character limit, or log_table will error. For now one can get around this by using log_artifact instead, but this pattern should not be generalized.

May 01 '24 17:05 marcosjt7

@marcosjt7 thanks for pointing this out. I think the whole focus of log_table is unclear and it should be reconsidered as a whole. I searched for the initial issue of this feature which may would help to clarify, but i couldn't find anything..

May 02 '24 10:05 turbotimon

@mlflow/mlflow-team Please assign a maintainer and start triaging this issue.

May 03 '24 00:05 github-actions[bot]

yocto-gl yocto-gl copied to clipboard

[FR] `log_table` support for csv (and not only json)

Willingness to contribute

Proposal Summary

Motivation

What is the use case for this feature?

Why is this use case valuable to support for MLflow users in general?

Why is this use case valuable to support for your project(s) or organization?

Why is it currently difficult to achieve this use case?

Details

What component(s) does this bug affect?

What interface(s) does this bug affect?

What language(s) does this bug affect?

What integration(s) does this bug affect?

yocto-gl
yocto-gl copied to clipboard