dvc icon indicating copy to clipboard operation
dvc copied to clipboard

Experiments don't properly run multiple pipelines

Open shcheklein opened this issue 3 years ago • 1 comments

Bug description

Queue experiment run the wrong pipeline. I was able to reproduce it on a simplistic repo, but users have issue in a more multi-pipeline scenario (it's not a completely artificial edge case).

Reproduce

This is the repo structure to use:

(.venv) √ Projects/test-pipelines % tree .
.
├── pipeline1
│   ├── dvc.lock
│   └── dvc.yaml
└── pipeline2
    ├── dvc.lock
    └── dvc.yaml

2 directories, 4 files

where each dvc.yaml looks like:

stages:
  p1-echo:
    cmd: echo test

Run:

cd pipeline1
dvc exp run --queue
cd ../pipeline2
dvc exp run --queue
cd ..
dvc exp run --run-all

Result, I'm getting is:

Following logs for all queued experiments. Use Ctrl+C to stop following logs (experiment execution will continue).

Running stage 'p2-echo':
> echo test
test
Generating lock file 'dvc.lock'
Updating lock file 'dvc.lock'
...
Running stage 'p2-echo':
> echo test
test
Generating lock file 'dvc.lock'
Updating lock file 'dvc.lock'

To track the changes with git, run:

...

Reproduced experiment(s): exp-0f110
To apply the results of an experiment to your workspace run:
...

Expected result

It runs twice the p2-echo, while it should be running different targets and output p1-echo + p2-echo.

Environment

DVC version: 2.17.1.dev1+g31635b83.d20220808
---------------------------------
Platform: Python 3.9.13 on macOS-12.4-arm64-arm-64bit
Supports:
	azure (adlfs = 2022.7.0, knack = 0.9.0, azure-identity = 1.10.0),
	gdrive (pydrive2 = 1.14.0),
	gs (gcsfs = 2022.5.0),
	hdfs (fsspec = 2022.5.0, pyarrow = 8.0.0),
	webhdfs (fsspec = 2022.5.0),
	http (aiohttp = 3.8.1, aiohttp-retry = 2.5.2),
	https (aiohttp = 3.8.1, aiohttp-retry = 2.5.2),
	s3 (s3fs = 2022.5.0, boto3 = 1.21.21),
	ssh (sshfs = 2022.6.0),
	oss (ossfs = 2021.8.0),
	webdav (webdav4 = 0.9.7),
	webdavs (webdav4 = 0.9.7)
Cache types: <https://error.dvc.org/no-dvc-cache>
Caches: local
Remotes: None
Workspace directory: apfs on /dev/disk3s1s1
Repo: dvc, git

Other considerations / potential bugs

  • When you do the regular dvc exp run in two different directories it doesn't create two different experiments (even though it runs two different targets in this case). I would consider this as a bug also. Don't know if we need a separate issue for this or this one is the same root cause.
  • UI / Terminology - I got completely confused with Reproduced experiment. I thought this is about run cache, or some existing experiments, etc. Why do we use reproduced for a completely new experiment, for example? And when it actually doesn't create a new experiment (since probably detects an existing one) it doesn't say anything at all. cc @dberenbaum

shcheklein avatar Aug 09 '22 19:08 shcheklein

On the terminology question, the reproduced messages are from before experiments were released. I wrote them that way at the time since the initial development framing was "repro with saved results". Most of the UI messaging just hasn't been revisited at all since then, but I don't think we are particularly tied to anything here and can adjust it as needed.

pmrowla avatar Aug 09 '22 23:08 pmrowla