yocto-gl icon indicating copy to clipboard operation
yocto-gl copied to clipboard

[FR] Add symlink support for logging artifacts

Open thisisandreeeee opened this issue 4 years ago • 2 comments

Thank you for submitting an issue. Please refer to our issue policy for information on what types of issues we address.

Please fill in this template and do not delete it unless you are sure your issue is outside its scope.


Guidelines

Feature requests typically go through the following lifecycle:

  1. Submit feature request with high-level description on GitHub issues (this is what you're doing now)
  2. Discuss feature request with a committer, who may ask for a more detailed design
  3. After discussion & agreement on feature request, start implementation

Describe the proposal

The log_artifact method in the mlflow tracking API currently requires artifacts to be stored in local directories. It would be useful if we could log references to artifacts instead of the artifacts themselves.

Motivation

Artifacts are occasionally very large files. In those scenarios, they might already exist on a cloud-based file system. It does not make sense to transfer this large file to a local machine in order to log it as an artifact because (1) the file may not fit on local disk, and (2) download times are likely to be long.

Proposed Changes

Pseudocode for logging artifacts from a GCS artifact repository using the Python SDK:

mlflow.log_artifacts("gs://path/to/some/other/file/blob.txt", "mlflow/artifact/path")

We do not validate that the artifact exists or that the user has adequate auth scope. We simply store the URI so that it can be consumed by other operations such as download_artifact.

thisisandreeeee avatar Apr 05 '20 08:04 thisisandreeeee

Hi @thisisandreeeee, can you provide an example workflow that is relevant to this use case? Presently, you may be able to achieve the behavior you're looking for by setting an artifact location as a tag on an MLflow run. You can leverage the mlflow artifacts download CLI (https://mlflow.org/docs/latest/cli.html#mlflow-artifacts-download) to fetch artifacts from these tagged locations. If it's helpful, we may be able to expose a public Python API for downloading this data as well.

dbczumar avatar Jul 17 '20 06:07 dbczumar

Here is one example workflow: I am running jobs via slurm. I can programmatically figure out where the slurm stdout/stderr logs go. If I symlink them to the artifact store, I should be able to view them in the MLFlow UI (also related: #3222).

mshvartsman avatar Jan 31 '24 20:01 mshvartsman