dvc icon indicating copy to clipboard operation
dvc copied to clipboard

exp apply: not working for failed experiments

Open macio232 opened this issue 2 years ago • 5 comments

Bug Report

Description

When an experiment fails, and I want to change something and re-run it, I first need to apply it, make changes and add the new version to the queue. Unfortunately, dvc exp apply <hash> on a failed experiment has the following result:

2022-08-25 09:47:00,888 ERROR: '20ea06d' does not appear to be an experiment commit.: Experiment derived from 'celeryf', expected '3b0d8e3'.
------------------------------------------------------------
Traceback (most recent call last):
  File "/home/maciej/venvs/motor-generative-modelling/lib/python3.9/site-packages/dvc/repo/experiments/apply.py", line 38, in apply
    exps.check_baseline(exp_rev)
  File "/home/maciej/venvs/motor-generative-modelling/lib/python3.9/site-packages/dvc/repo/experiments/__init__.py", line 378, in check_baseline
    raise BaselineMismatchError(exp_baseline, baseline_sha)
dvc.repo.experiments.exceptions.BaselineMismatchError: Experiment derived from 'celeryf', expected '3b0d8e3'.

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/home/maciej/venvs/motor-generative-modelling/lib/python3.9/site-packages/dvc/cli/__init__.py", line 185, in main
    ret = cmd.do_run()
  File "/home/maciej/venvs/motor-generative-modelling/lib/python3.9/site-packages/dvc/cli/command.py", line 22, in do_run
    return self.run()
  File "/home/maciej/venvs/motor-generative-modelling/lib/python3.9/site-packages/dvc/commands/experiments/apply.py", line 14, in run
    self.repo.experiments.apply(
  File "/home/maciej/venvs/motor-generative-modelling/lib/python3.9/site-packages/dvc/repo/experiments/__init__.py", line 499, in apply
    return apply(self.repo, *args, **kwargs)
  File "/home/maciej/venvs/motor-generative-modelling/lib/python3.9/site-packages/dvc/repo/__init__.py", line 48, in wrapper
    return f(repo, *args, **kwargs)
  File "/home/maciej/venvs/motor-generative-modelling/lib/python3.9/site-packages/dvc/repo/scm_context.py", line 156, in run
    return method(repo, *args, **kw)
  File "/home/maciej/venvs/motor-generative-modelling/lib/python3.9/site-packages/dvc/repo/experiments/apply.py", line 40, in apply
    raise InvalidExpRevError(rev) from exc
dvc.repo.experiments.exceptions.InvalidExpRevError: '20ea06d' does not appear to be an experiment commit.
------------------------------------------------------------
2022-08-25 09:47:00,891 DEBUG: Analytics is enabled.
2022-08-25 09:47:00,917 DEBUG: Trying to spawn '['daemon', '-q', 'analytics', '/tmp/tmpmqf25q63']'
2022-08-25 09:47:00,919 DEBUG: Spawned '['daemon', '-q', 'analytics', '/tmp/tmpmqf25q63']

Reproduce

I can not share my code, and I don't think preparing a toy example is needed here.

Expected

Changes to code/configuration files are applied in the workspace as they have been scheduled for execution.

Environment information

Output of dvc doctor:

DVC version: 2.18.1 (pip)
---------------------------------
Platform: Python 3.9.5 on Linux-5.4.0-124-generic-x86_64-with-glibc2.31
Supports:
        azure (adlfs = 2022.4.0, knack = 0.9.0, azure-identity = 1.10.0),
        http (aiohttp = 3.8.1, aiohttp-retry = 2.5.1),
        https (aiohttp = 3.8.1, aiohttp-retry = 2.5.1),
        s3 (s3fs = 2022.5.0, boto3 = 1.21.21),
        webhdfs (fsspec = 2022.5.0)
Cache types: hardlink, symlink
Cache directory: ext4 on /dev/mapper/ubuntu--vg-home--vg
Caches: local
Remotes: s3
Workspace directory: ext4 on /dev/mapper/ubuntu--vg-home--vg
Repo: dvc, git

Kind regards, macio232

macio232 avatar Aug 25 '22 09:08 macio232

Hello, @macio232 . This is because, DVC wouldn't store the failed experiment's result, and actually in most cases failed experiment will not update the output of pipeline. And if you want to examine the running stdout of a failed experiment, you should use dvc queue logs {exp-name} for exp run with the queue.

karajan1001 avatar Aug 25 '22 10:08 karajan1001

@karajan1001 I know what caused the error (by already using dvc queue logs {exp-name}, well dvc queue logs {exp-hash} to be precise, because you can not reference a failed experiment by its name). And based on this knowledge, I want to update my experiment and run it again. To do so, I need to first recover the code & configs used to run the experiment (therefore, I need a working dvc exp apply). I don't expect to get any outputs with dvc exp apply.

macio232 avatar Aug 25 '22 10:08 macio232

@karajan1001 I know what caused the error (by already using dvc queue logs {exp-name}, well dvc queue logs {exp-hash} to be precise, because you can not reference a failed experiment by its name). And based on this knowledge, I want to update my experiment and run it again. To do so, I need to first recover the code & configs used to run the experiment (therefore, I need a working dvc exp apply). I don't expect to get any outputs with dvc exp apply.

Yeah, actually it was stored in a particular place for failed experiments(not git commits but stashes). but for now we lack a new API to pop the failed stashes.

karajan1001 avatar Aug 25 '22 13:08 karajan1001

For now, there is a little bit hacky method to check out it. You can try it with

git checkout $(cat .git/refs/exps/celery/failed)

karajan1001 avatar Aug 25 '22 13:08 karajan1001

exp apply needs to check the failed refs stash now. This usage of apply worked before the celery changes (since failed exps were just re-added to the regular queue), so this should be considered a regression (and it's a simple fix)

https://github.com/iterative/dvc/blob/063eb6904dc79c2e5be9e1b57f7ecaa781eded8b/dvc/repo/experiments/apply.py#L42

this just needs to be something like

stash_rev = exp_rev in exps.stash_revs or exp_rev in exps.celery_queue.failed_stash.stash_revs

(apply doesn't pop from the stash, we just need to check that the git SHA exists in one of our stashes)

pmrowla avatar Aug 25 '22 13:08 pmrowla