ray
ray copied to clipboard
[Tune] Checkpoints returned by Tuner do not point to cloud
What happened + What you expected to happen
If a Tuner has a sync_config set, the expectation is that the Checkpoints contained within Results returned would point to the cloud URI. Instead, they point to local path, which may or may not exist.
This issue is not present in the old ExperimentAnalysis API which correctly returns a Checkpoint pointing to cloud (see repro script).
Versions / Dependencies
master (e341fa7d5fe7b873c3a9a6150d400dff61ffe1b6)
Reproduction script
import ray
from ray import tune
from ray import air
from ray.air import session, Checkpoint
sync_config = tune.SyncConfig(
upload_dir="s3://path-here",
)
def trainable(config):
for i in range(5):
checkpoint = Checkpoint.from_dict({"model": i})
session.report(metrics={"metric": i}, checkpoint=checkpoint)
# this starts the run!
tuner = tune.Tuner(
trainable,
run_config=air.RunConfig(
sync_config=sync_config,
),
tune_config=tune.TuneConfig(
metric="metric",
mode="max",
)
)
results = tuner.fit()
# assertion fails
assert results._experiment_analysis.best_checkpoint.uri == results[0].checkpoint.uri
Issue Severity
High: It blocks me from completing my task.