yocto-gl
yocto-gl copied to clipboard
[FR] `log_table` support for csv (and not only json)
Willingness to contribute
Yes. I can contribute this feature independently.
Proposal Summary
The experimental log_table
feature is great! However, it supports only json
. I think supporting also csv
would be a benefit for the user and easy to implement.
Motivation
What is the use case for this feature?
Log data as csv (instead of only json)
Why is this use case valuable to support for MLflow users in general?
- It's easier to view, as mlflow already renders csv as html tables nicely in the browser (and json as plain jsons)
- More flexibility like e.g, log_dict supports json and yml.
Why is this use case valuable to support for your project(s) or organization?
Why is it currently difficult to achieve this use case?
- not really,
mlflow.log_text(df.to_csv(), artifact_file="example.csv")
would to the job. However, it is not intuitive as there is a log_table function
Details
- i would make use of the existing pandas.to_csv function
- i would decide if csv or json from the provided
artifact_file
string: If csv=csv, else json (likewise its done in log_dict for json/yaml)
What component(s) does this bug affect?
- [X]
area/artifacts
: Artifact stores and artifact logging - [ ]
area/build
: Build and test infrastructure for MLflow - [ ]
area/deployments
: MLflow Deployments client APIs, server, and third-party Deployments integrations - [ ]
area/docs
: MLflow documentation pages - [ ]
area/examples
: Example code - [ ]
area/model-registry
: Model Registry service, APIs, and the fluent client calls for Model Registry - [ ]
area/models
: MLmodel format, model serialization/deserialization, flavors - [ ]
area/recipes
: Recipes, Recipe APIs, Recipe configs, Recipe Templates - [ ]
area/projects
: MLproject format, project running backends - [ ]
area/scoring
: MLflow Model server, model deployment tools, Spark UDFs - [ ]
area/server-infra
: MLflow Tracking server backend - [ ]
area/tracking
: Tracking Service, tracking client APIs, autologging
What interface(s) does this bug affect?
- [X]
area/uiux
: Front-end, user experience, plotting, JavaScript, JavaScript dev server - [ ]
area/docker
: Docker use across MLflow's components, such as MLflow Projects and MLflow Models - [ ]
area/sqlalchemy
: Use of SQLAlchemy in the Tracking Service or Model Registry - [ ]
area/windows
: Windows support
What language(s) does this bug affect?
- [ ]
language/r
: R APIs and clients - [ ]
language/java
: Java APIs and clients - [ ]
language/new
: Proposals for new client languages
What integration(s) does this bug affect?
- [ ]
integrations/azure
: Azure and Azure ML integrations - [ ]
integrations/sagemaker
: SageMaker integrations - [ ]
integrations/databricks
: Databricks integrations
This feature makes sense ! :)
The function in its current form seems to do much more than one would expect and has some inconsistencies:
-
It supports images in table and does some "magic" with it like saving them to a pre-defined folder name in two sizes. This is neither documented nor do i see a broad use case for that..
-
The corresponding
load_table
does not have this functionality to load images..
I ran into another issue with log_table
: unlike any other of the log_artifact
functions, it uses a pattern in which it appends the artifact path to an mlflow tag mlflow.loggedArtifacts
(see here). Since mlflow tags have a max length of 5000, this creates an artificial limit to the number of artifacts that can be saved per run. Basically, the summed total length of all table artifact paths cannot exceed an mlflow tag's character limit, or log_table
will error. For now one can get around this by using log_artifact
instead, but this pattern should not be generalized.
@marcosjt7 thanks for pointing this out. I think the whole focus of log_table
is unclear and it should be reconsidered as a whole. I searched for the initial issue of this feature which may would help to clarify, but i couldn't find anything..
@mlflow/mlflow-team Please assign a maintainer and start triaging this issue.