ray icon indicating copy to clipboard operation
ray copied to clipboard

[Tune] Checkpoints returned by Tuner do not point to cloud

Open Yard1 opened this issue 2 years ago • 0 comments

What happened + What you expected to happen

If a Tuner has a sync_config set, the expectation is that the Checkpoints contained within Results returned would point to the cloud URI. Instead, they point to local path, which may or may not exist.

This issue is not present in the old ExperimentAnalysis API which correctly returns a Checkpoint pointing to cloud (see repro script).

Versions / Dependencies

master (e341fa7d5fe7b873c3a9a6150d400dff61ffe1b6)

Reproduction script


import ray
from ray import tune
from ray import air
from ray.air import session, Checkpoint

sync_config = tune.SyncConfig(
    upload_dir="s3://path-here",
)

def trainable(config):
    for i in range(5):
        checkpoint = Checkpoint.from_dict({"model": i})
        session.report(metrics={"metric": i}, checkpoint=checkpoint)

# this starts the run!
tuner = tune.Tuner(
    trainable,
    run_config=air.RunConfig(
        sync_config=sync_config,
    ),
    tune_config=tune.TuneConfig(
        metric="metric",
        mode="max",
    )
)
results = tuner.fit()

# assertion fails
assert results._experiment_analysis.best_checkpoint.uri == results[0].checkpoint.uri

Issue Severity

High: It blocks me from completing my task.

Yard1 avatar Dec 09 '22 22:12 Yard1