dvc
dvc copied to clipboard
exp apply: not working for failed experiments
Bug Report
Description
When an experiment fails, and I want to change something and re-run it, I first need to apply it, make changes and add the new version to the queue. Unfortunately, dvc exp apply <hash>
on a failed experiment has the following result:
2022-08-25 09:47:00,888 ERROR: '20ea06d' does not appear to be an experiment commit.: Experiment derived from 'celeryf', expected '3b0d8e3'.
------------------------------------------------------------
Traceback (most recent call last):
File "/home/maciej/venvs/motor-generative-modelling/lib/python3.9/site-packages/dvc/repo/experiments/apply.py", line 38, in apply
exps.check_baseline(exp_rev)
File "/home/maciej/venvs/motor-generative-modelling/lib/python3.9/site-packages/dvc/repo/experiments/__init__.py", line 378, in check_baseline
raise BaselineMismatchError(exp_baseline, baseline_sha)
dvc.repo.experiments.exceptions.BaselineMismatchError: Experiment derived from 'celeryf', expected '3b0d8e3'.
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/home/maciej/venvs/motor-generative-modelling/lib/python3.9/site-packages/dvc/cli/__init__.py", line 185, in main
ret = cmd.do_run()
File "/home/maciej/venvs/motor-generative-modelling/lib/python3.9/site-packages/dvc/cli/command.py", line 22, in do_run
return self.run()
File "/home/maciej/venvs/motor-generative-modelling/lib/python3.9/site-packages/dvc/commands/experiments/apply.py", line 14, in run
self.repo.experiments.apply(
File "/home/maciej/venvs/motor-generative-modelling/lib/python3.9/site-packages/dvc/repo/experiments/__init__.py", line 499, in apply
return apply(self.repo, *args, **kwargs)
File "/home/maciej/venvs/motor-generative-modelling/lib/python3.9/site-packages/dvc/repo/__init__.py", line 48, in wrapper
return f(repo, *args, **kwargs)
File "/home/maciej/venvs/motor-generative-modelling/lib/python3.9/site-packages/dvc/repo/scm_context.py", line 156, in run
return method(repo, *args, **kw)
File "/home/maciej/venvs/motor-generative-modelling/lib/python3.9/site-packages/dvc/repo/experiments/apply.py", line 40, in apply
raise InvalidExpRevError(rev) from exc
dvc.repo.experiments.exceptions.InvalidExpRevError: '20ea06d' does not appear to be an experiment commit.
------------------------------------------------------------
2022-08-25 09:47:00,891 DEBUG: Analytics is enabled.
2022-08-25 09:47:00,917 DEBUG: Trying to spawn '['daemon', '-q', 'analytics', '/tmp/tmpmqf25q63']'
2022-08-25 09:47:00,919 DEBUG: Spawned '['daemon', '-q', 'analytics', '/tmp/tmpmqf25q63']
Reproduce
I can not share my code, and I don't think preparing a toy example is needed here.
Expected
Changes to code/configuration files are applied in the workspace as they have been scheduled for execution.
Environment information
Output of dvc doctor
:
DVC version: 2.18.1 (pip)
---------------------------------
Platform: Python 3.9.5 on Linux-5.4.0-124-generic-x86_64-with-glibc2.31
Supports:
azure (adlfs = 2022.4.0, knack = 0.9.0, azure-identity = 1.10.0),
http (aiohttp = 3.8.1, aiohttp-retry = 2.5.1),
https (aiohttp = 3.8.1, aiohttp-retry = 2.5.1),
s3 (s3fs = 2022.5.0, boto3 = 1.21.21),
webhdfs (fsspec = 2022.5.0)
Cache types: hardlink, symlink
Cache directory: ext4 on /dev/mapper/ubuntu--vg-home--vg
Caches: local
Remotes: s3
Workspace directory: ext4 on /dev/mapper/ubuntu--vg-home--vg
Repo: dvc, git
Kind regards, macio232
Hello, @macio232 . This is because, DVC wouldn't store the failed experiment's result, and actually in most cases failed experiment will not update the output of pipeline. And if you want to examine the running stdout of a failed experiment, you should use dvc queue logs {exp-name}
for exp run with the queue.
@karajan1001 I know what caused the error (by already using dvc queue logs {exp-name}
, well dvc queue logs {exp-hash}
to be precise, because you can not reference a failed experiment by its name). And based on this knowledge, I want to update my experiment and run it again. To do so, I need to first recover the code & configs used to run the experiment (therefore, I need a working dvc exp apply
). I don't expect to get any outputs with dvc exp apply
.
@karajan1001 I know what caused the error (by already using
dvc queue logs {exp-name}
, welldvc queue logs {exp-hash}
to be precise, because you can not reference a failed experiment by its name). And based on this knowledge, I want to update my experiment and run it again. To do so, I need to first recover the code & configs used to run the experiment (therefore, I need a workingdvc exp apply
). I don't expect to get any outputs withdvc exp apply
.
Yeah, actually it was stored in a particular place for failed experiments(not git commits but stashes). but for now we lack a new API to pop the failed stashes.
For now, there is a little bit hacky method to check out it. You can try it with
git checkout $(cat .git/refs/exps/celery/failed)
exp apply
needs to check the failed refs stash now. This usage of apply worked before the celery changes (since failed exps were just re-added to the regular queue), so this should be considered a regression (and it's a simple fix)
https://github.com/iterative/dvc/blob/063eb6904dc79c2e5be9e1b57f7ecaa781eded8b/dvc/repo/experiments/apply.py#L42
this just needs to be something like
stash_rev = exp_rev in exps.stash_revs or exp_rev in exps.celery_queue.failed_stash.stash_revs
(apply doesn't pop
from the stash, we just need to check that the git SHA exists in one of our stashes)