yocto-gl
yocto-gl copied to clipboard
[FR] Add symlink support for logging artifacts
Thank you for submitting an issue. Please refer to our issue policy for information on what types of issues we address.
Please fill in this template and do not delete it unless you are sure your issue is outside its scope.
Guidelines
Feature requests typically go through the following lifecycle:
- Submit feature request with high-level description on GitHub issues (this is what you're doing now)
- Discuss feature request with a committer, who may ask for a more detailed design
- After discussion & agreement on feature request, start implementation
Describe the proposal
The log_artifact
method in the mlflow tracking API currently requires artifacts to be stored in local directories. It would be useful if we could log references to artifacts instead of the artifacts themselves.
Motivation
Artifacts are occasionally very large files. In those scenarios, they might already exist on a cloud-based file system. It does not make sense to transfer this large file to a local machine in order to log it as an artifact because (1) the file may not fit on local disk, and (2) download times are likely to be long.
Proposed Changes
Pseudocode for logging artifacts from a GCS artifact repository using the Python SDK:
mlflow.log_artifacts("gs://path/to/some/other/file/blob.txt", "mlflow/artifact/path")
We do not validate that the artifact exists or that the user has adequate auth scope. We simply store the URI so that it can be consumed by other operations such as download_artifact
.
Hi @thisisandreeeee, can you provide an example workflow that is relevant to this use case? Presently, you may be able to achieve the behavior you're looking for by setting an artifact location as a tag on an MLflow run. You can leverage the mlflow artifacts download
CLI (https://mlflow.org/docs/latest/cli.html#mlflow-artifacts-download) to fetch artifacts from these tagged locations. If it's helpful, we may be able to expose a public Python API for downloading this data as well.
Here is one example workflow: I am running jobs via slurm. I can programmatically figure out where the slurm stdout/stderr logs go. If I symlink them to the artifact store, I should be able to view them in the MLFlow UI (also related: #3222).