syne-tune Experiment Results Contain Random Rows

In my experiment, the result data frame contains multiple rows with trial id 1 with the same content as the next row, the only difference being the config. This causes problems since sometimes the best config is now trial id 1 that shows a config which did not achieve the best performance.

See this example: True trial id 1 performance is 81% (Row 4) but trial id 1 also shows up in row 10 with highest accuracy. I've added a simple example to reproduce this behavior.

from pathlib import Path

from sagemaker.pytorch import PyTorch

from syne_tune.backend import SageMakerBackend
from sagemaker import get_execution_role
from syne_tune.optimizer.baselines import RandomSearch
from syne_tune import Tuner
from syne_tune.config_space import randint
from syne_tune import StoppingCriterion
from syne_tune.optimizer.schedulers.fifo import FIFOScheduler

entry_point = Path('examples') / "training_scripts" / "height_example" / "train_height.py"
assert entry_point.is_file(), 'File unknown'
mode = "min"
metric = "mean_loss"
instance_type = 'ml.c5.4xlarge'
instance_count = 1
instance_max_time = 999
n_workers = 20

config_space = {
    "steps": 1,
    "width": randint(0, 20),
    "height": randint(-100, 100)
}

backend = SageMakerBackend(
    sm_estimator=PyTorch(
        entry_point=str(entry_point),
        instance_type=instance_type,
        instance_count=instance_count,
        role=get_execution_role(),
        max_run=instance_max_time,
        py_version='py3',
        framework_version='1.6',
    ),
    metrics_names=[metric],
)

# Random search without stopping
scheduler = FIFOScheduler(
    config_space=config_space,
    searcher='random',
    mode=mode,
    metric=metric,
)

tuner = Tuner(
    trial_backend=backend,
    scheduler=scheduler,
    stop_criterion=StoppingCriterion(max_wallclock_time=300),
    n_workers=n_workers,
)

tuner.run()

Mar 18 '22 16:03 wistuba

OK, I know you guys just hate it, but I'd still love to see some debug_log output. Then, we could see what that trial_id 1 is really doing over the course of the experiment.

Mar 18 '22 16:03 mseeger

I don't get one thing. Your config space has height and width, but the table shows config_lr.

Mar 18 '22 16:03 mseeger

I am also missing the step attribute

Mar 18 '22 16:03 mseeger

What I'd like to see is the results dataframe, which is just very different from that table. In the end, you create your plots and so on from the results, right?

Mar 18 '22 16:03 mseeger

The table refers to my original experiment. I created the example later and confirmed that the same happens but didn't take another screenshot. My original experiment had only one hyperparameter: lr.

How is this table different from the results dataframe? This is load_experiment(tuner.name).results.

I have no problem with debug_log, I wasn't aware of it and I am not sure how to properly use it. As you've suggested, I activated it by passing it to the searcher and checked only the output on the console. Is this the intended use or does it write more logs somewhere else? Otherwise I couldn't spot anything suspicious. Trial id 1 is actually finished. Nevertheless, more results are reported. I am currently running a much longer experiment. Let's see how frequently trial id 1 pops up.

Mar 18 '22 17:03 wistuba

Longer experiment: 287 rows in the table, 278 trials in total. Only trial id 1 occurs multiple times and it occurs within the first 24 rows (20 workers).

Mar 18 '22 17:03 wistuba

I don't know what load_experiment(...).results is doing. I'd recommend loading the CSV directly and checking what is going on. Thanks for raising this. We need to figure this out. Does this happen for local backend as well? I've not been using the SageMaker backend much at all, it may have quite some glitches.

Mar 18 '22 20:03 mseeger

That's basically what the function does, loading the results.csv.zip: https://github.com/awslabs/syne-tune/blob/main/syne_tune/experiments.py#L144 I've shared my results.csv (and I believe this one is for the example snippet above) internally.

I didn't face problems using only a single worker. I'll try many workers on local backend and let you know.

I started to use SM backend more frequently now. Fortunately that's the first and only one I've faced. Let's hope it is the last one as well.

Mar 18 '22 21:03 wistuba

Aaron is also using it more now, so let us figure this out!

Mar 18 '22 21:03 mseeger

Hi Martin, I just tried the example you gave and I do have a results.csv with one row per trial. Did you observe the issue with mainline? Can you confirm that the issue also happened with the script you gave?

Mar 21 '22 10:03 geoalgo

I set up a new SM notebook instance, created a new conda environment, and run the script above:

conda create -n test python=3.9.5 -y
conda activate test
pip install syne-tune
git clone https://github.com/awslabs/syne-tune.git
cd syne-tune
pip install -r requirements-ray.txt
python script.py

from syne_tune.experiments import load_experiment
load_experiment('train-height-2022-03-31-12-34-19-697').results['trial_id']

Again, multiple 1s showed up.

Mar 31 '22 12:03 wistuba

I've run an experiment on a grid with 49 configurations and during the search all were evaluated. The results table has 52 rows, 4 of which have trial id 1. Exactly 49 jobs were executed on SageMaker.

It is not limited to trial id 1. In addition to 1, I also saw 2.

May 16 '22 08:05 wistuba

We are working on finding a good setup to reproduce this issue as it happens sporadically.

Aug 31 '22 09:08 geoalgo

I might have experienced this issue as well. Here is what happens for me. I am running an experiment with SM backend, ASHA, and lstm_wikitext2 benchmark. There are 10 seeds, 6 fail, 4 succeed.

In the 6 failed ones, this happens:

trial_id 1 is successful and moves beyond 9 or 27 epochs
scheduler receives report from trial_id 1 with resource=1, this leads to exception
in all cases, the next trial_id to report anything at resource=1, is always 10. "10" looks similar to "1" (?)

In the 4 successful ones, trial_id 1 is stopped at 1 or 3, at a time when trial_id 10 does not exist yet.

Oct 10 '22 07:10 mseeger

I'll dig a bit into this. If trial_id's are mixed up, we can detect this easily by passing trial_id to training function and back in the reports.

Oct 10 '22 07:10 mseeger

Hunch: Trial with id 10 is mistaken for trial_id 1. Maybe a path is mismatched. In S3, "XYZ-1" matches to "XYZ-1*", unless you use "XYZ-1/". I'll have a look.

Oct 10 '22 07:10 mseeger

Closing as #374 seems to have addressed the issue.

Oct 11 '22 07:10 geoalgo

syne-tune syne-tune copied to clipboard

Experiment Results Contain Random Rows

syne-tune
syne-tune copied to clipboard