dvc icon indicating copy to clipboard operation
dvc copied to clipboard

```dvc exp run --queue```: deleting file 'params.yaml' for each command call

Open alimagadovk opened this issue 3 years ago • 6 comments

Bug Report

dvc exp run --queue: deleting file 'params.yaml' for each command call

Description

dvc exp run --queue deletes file 'params.yaml' for each trial to add a new experiment and the experiment don't put into queue. It works correctly only if command git add params.yaml was called before.

Reproduce

Example files: test.py

import numpy as np
import json
import yaml

params = yaml.safe_load(open('params.yaml'))["test"]

precision = np.random.random()
recall = params['value']
accuracy = np.random.random()

rows = {'precision': precision,
        'recall': recall,
        'accuracy': accuracy}

with open(params['metrics_path'], 'w') as outfile:
    json.dump(rows, outfile)

fpr = 10*np.random.random((1,10)).tolist()
tpr = 10*np.random.random((1,10)).tolist()

with open('plot.json', 'w') as outfile2:
    json.dump(
      {
        "roc": [ {"fpr": f, "tpr": t} for f, t in zip(fpr, tpr) ]
      }, 
      outfile2
      )

params.yaml

test:
  metrics_path: scores.json
  value: 2

dvc.yaml

stages:
  test:
    cmd: python test.py
    deps:
    - test.py
    params:
    - test.metrics_path
    - test.value
    metrics:
    - scores.json:
        cache: false
    plots:
    - plot.json:
        cache: false
        x: fpr
        y: tpr
  1. dvc exp run --queue -S test.value=101

Expected

New experiments adds to queue without 'params.yaml' deleting.

Environment information

Ubuntu 20.04

Output of dvc doctor:

$ dvc doctor

DVC version: 2.10.2 (pip)
---------------------------------
Platform: Python 3.9.7 on Linux-5.13.0-40-generic-x86_64-with-glibc2.31
Supports:
        webhdfs (fsspec = 2022.3.0),
        http (aiohttp = 3.8.1, aiohttp-retry = 2.4.6),
        https (aiohttp = 3.8.1, aiohttp-retry = 2.4.6)
Cache types: hardlink, symlink
Cache directory: ext4 on /dev/nvme0n1p6
Caches: local
Remotes: None
Workspace directory: ext4 on /dev/nvme0n1p6
Repo: dvc, git

Additional Information (if any):

alimagadovk avatar Jun 03 '22 07:06 alimagadovk

probably related to https://github.com/iterative/dvc/issues/6930 (untracked files being removed on post-run apply)

pmrowla avatar Jun 03 '22 08:06 pmrowla

Only occurs if --set-param modifies an untracked params file and --temp or --queue flag is passed.

The untracked params file gets added scm.add on _update_params and stashed to stash_rev.

When using WorkspaceExecutor, init_git applies a merge on the current git repo, "restoring" the params file in the user workspace.

When using TmpDirExecutor, the merge is applied on the temp dir git repo, so the params file is never "restored" in the user workspace.

daavoo avatar Aug 02 '22 09:08 daavoo

It seems like the behavior is expected then. However, maybe we can clarify in the docs that experiment files should be tracked by either Git or DVC?

dberenbaum avatar Aug 02 '22 19:08 dberenbaum

I stumbled over this today and found it at least "unexpected" and did not know how to fix it, until I came here.

behrica avatar Sep 08 '22 15:09 behrica

Yes, I think I was incorrect to say it is expected behavior. In fact, I ran into it myself and was confused recently (see #8256). Should we be applying the merge on top of both the tmp dir and the workspace @pmrowla?

dberenbaum avatar Sep 08 '22 16:09 dberenbaum

Applying the merge on the workspace isn't the correct behavior, that would put -S modified params into the workspace and not the original params file.

I think the fix for this should just be to use stash_workspace(include_untracked=True) here https://github.com/iterative/dvc/blob/55b16166c852a72e8dd47d4e32090063d27497e4/dvc/repo/experiments/queue/base.py#L320

When we leave that context it is supposed to restore the workspace to the state before any DVC exp related changes/operations were done. include_untracked was omitted because we aren't supposed to be touching untracked files at all in exps, but the bug is we actually can modify untracked files in the params file case.

pmrowla avatar Sep 09 '22 01:09 pmrowla