dvc icon indicating copy to clipboard operation
dvc copied to clipboard

exp show: sync state between queue and exp show table

Open karajan1001 opened this issue 3 years ago • 11 comments

fix: #8088

  1. Refactor seperate the initialization of executor and setup environment
  2. Move ref setup into executor.init_git
  3. Add a new attribute status to ExecutorInfo file
  4. Update running status to the executor infofile.
  5. Use task status to replace collected.
  6. Move some basic test script from function tests to unit test.
  7. Add success/failed tests for the status change of tempdir, celery, workspace running case.

Thank you for the contribution - we'll try to review it as soon as possible. 🙏

karajan1001 avatar Aug 22 '22 08:08 karajan1001

Looks like need to clean the manager or the pylint will fail.

karajan1001 avatar Sep 09 '22 08:09 karajan1001

Excuse me, @mattseddon, Could you please help me to verify if the problem is solved after this PR on the VSCode extension? Thank you.

karajan1001 avatar Sep 14 '22 05:09 karajan1001

Excuse me, @mattseddon, Could you please help me to verify if the problem is solved after this PR on the VSCode extension? Thank you.

I will test 👍🏻.

mattseddon avatar Sep 14 '22 05:09 mattseddon

This is the experience in the extension using dvc queue start -j 1 with 3 queued experiments:

https://user-images.githubusercontent.com/37993418/190292424-3ee67d58-7405-4ec8-9567-b974333c31d9.mov

It is an issue that dvc exp show can run into unexpected errors that look like this:

ERROR: unexpected error - Invalid revision: b'02712dc464ab868043e7eefc335a8d5fd39ab6f7'

Question: Is running in exp show synced with TaskStatus.PREPARING in the executor or is there another field in the output that I should be looking for?


Note: Probably unrelated to this change but I started by trying to run with -j 3 and saw some very weird behaviour.

I went through the following steps:

  1. installed git+https://github.com/karajan1001/dvc.git@fix8088 into a demo project's virtual environment
  2. dvc exp run --queue x 3 with different params for each
  3. dvc queue start -j 3
  4. One experiment succeeded and two failed.

After those failures, any attempt to queue an experiment would result in the experiment being run straight away. Even though queue status stated that there were no active workers:

~/demo main !2 ?1 ❯ dvc queue status
Task     Name       Created    Status
e7bf66b             10:35 AM   Failed
ffc912f             10:35 AM   Failed
08a047c  exp-92e70  10:35 AM   Success
6e259ab  exp-684d6  10:41 AM   Success
aad7cf0  exp-14b96  10:40 AM   Success

Worker status: 0 active, 0 idle

~/demo main !2 ?1 ❯ dvc exp run --queue
Queued experiment '2f7751d' for future execution.   
                                                                                                                                                                                                                                                             
~/demo main !2 ?1 ❯ dvc queue status
Task     Name       Created    Status
2f7751d             10:42 AM   Running
e7bf66b             10:35 AM   Failed
ffc912f             10:35 AM   Failed
08a047c  exp-92e70  10:35 AM   Success
6e259ab  exp-684d6  10:41 AM   Success
aad7cf0  exp-14b96  10:40 AM   Success

Worker status: 0 active, 0 idle

The only way that I could get the repo out of this state was to delete .dvc/tmp/exps. dvc queue stop & dvc queue kill had no impact.

mattseddon avatar Sep 15 '22 01:09 mattseddon

Note: Probably unrelated to this change but I started by trying to run with -j 3 and saw some very weird behaviour.

I went through the following steps:

installed git+https://github.com/karajan1001/dvc.git@fix8088 into a demo project's virtual environment dvc exp run --queue x 3 with different params for each dvc queue start -j 3 One experiment succeeded and two failed. After those failures, any attempt to queue an experiment would result in the experiment being run straight away. Even > though queue status stated that there were no active workers:

For the job count 3, you need to test it after https://github.com/iterative/dvc-task/pull/90 merged.

Question: Is running in exp show synced with TaskStatus.PREPARING in the executor or is there another field in the output that I should be looking for?

exp show reads the TaskStatus of each exps but not only depends on them, because the TaskStatus will only be generated after the exp begins to run.

karajan1001 avatar Sep 15 '22 07:09 karajan1001

Verbose log for error:

~/projects/vscode-dvc/demo main *1 !4 ?1 ❯ dvc exp show --show-json -v                                                                                                                                                                                                          ✘ 252 18s  .env  base 10:23:44
2022-09-16 10:23:52,332 ERROR: unexpected error - Invalid revision: b'd0b057085e1e96b3406f78cd9cb2decfb86976b3'
------------------------------------------------------------
Traceback (most recent call last):
  File "/Users/mattseddon/projects/vscode-dvc/demo/.env/lib/python3.9/site-packages/scmrepo/git/backend/dulwich/__init__.py", line 652, in fetch_refspecs
    check_diverged(
  File "/Users/mattseddon/projects/vscode-dvc/demo/.env/lib/python3.9/site-packages/dulwich/porcelain.py", line 347, in check_diverged
    raise DivergedBranches(current_sha, new_sha)
dulwich.porcelain.DivergedBranches: b'd0b057085e1e96b3406f78cd9cb2decfb86976b3'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/Users/mattseddon/projects/vscode-dvc/demo/.env/lib/python3.9/site-packages/scmrepo/git/backend/dulwich/__init__.py", line 745, in diff
    commit_a = self.repo[os.fsencode(rev_a)]
  File "/Users/mattseddon/projects/vscode-dvc/demo/.env/lib/python3.9/site-packages/dulwich/repo.py", line 787, in __getitem__
    return self.object_store[self.refs[name]]
  File "/Users/mattseddon/projects/vscode-dvc/demo/.env/lib/python3.9/site-packages/dulwich/refs.py", line 320, in __getitem__
    raise KeyError(name)
KeyError: b'd0b057085e1e96b3406f78cd9cb2decfb86976b3'

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/Users/mattseddon/projects/vscode-dvc/demo/.env/lib/python3.9/site-packages/dvc/cli/__init__.py", line 185, in main
    ret = cmd.do_run()
  File "/Users/mattseddon/projects/vscode-dvc/demo/.env/lib/python3.9/site-packages/dvc/cli/command.py", line 22, in do_run
    return self.run()
  File "/Users/mattseddon/projects/vscode-dvc/demo/.env/lib/python3.9/site-packages/dvc/commands/experiments/show.py", line 475, in run
    all_experiments = self.repo.experiments.show(
  File "/Users/mattseddon/projects/vscode-dvc/demo/.env/lib/python3.9/site-packages/dvc/repo/experiments/__init__.py", line 516, in show
    return show(self.repo, *args, **kwargs)
  File "/Users/mattseddon/projects/vscode-dvc/demo/.env/lib/python3.9/site-packages/dvc/repo/experiments/show.py", line 163, in show
    running = repo.experiments.get_running_exps(fetch_refs=fetch_running)
  File "/Users/mattseddon/projects/vscode-dvc/demo/.env/lib/python3.9/site-packages/dvc/repo/experiments/__init__.py", line 443, in get_running_exps
    self._fetch_running_exp(
  File "/Users/mattseddon/projects/vscode-dvc/demo/.env/lib/python3.9/site-packages/dvc/repo/experiments/__init__.py", line 482, in _fetch_running_exp
    for ref in executor.fetch_exps(
  File "/Users/mattseddon/projects/vscode-dvc/demo/.env/lib/python3.9/site-packages/dvc/repo/experiments/executor/base.py", line 358, in fetch_exps
    dest_scm.fetch_refspecs(
  File "/Users/mattseddon/projects/vscode-dvc/demo/.env/lib/python3.9/site-packages/scmrepo/git/__init__.py", line 289, in _backend_func
    result = func(*args, **kwargs)
  File "/Users/mattseddon/projects/vscode-dvc/demo/.env/lib/python3.9/site-packages/scmrepo/git/backend/dulwich/__init__.py", line 658, in fetch_refspecs
    on_diverged(
  File "/Users/mattseddon/projects/vscode-dvc/demo/.env/lib/python3.9/site-packages/dvc/repo/experiments/executor/base.py", line 349, in on_diverged_ref
    self._raise_ref_conflict(
  File "/Users/mattseddon/projects/vscode-dvc/demo/.env/lib/python3.9/site-packages/dvc/repo/experiments/executor/base.py", line 734, in _raise_ref_conflict
    if scm.diff(orig_rev, new_rev):
  File "/Users/mattseddon/projects/vscode-dvc/demo/.env/lib/python3.9/site-packages/scmrepo/git/__init__.py", line 289, in _backend_func
    result = func(*args, **kwargs)
  File "/Users/mattseddon/projects/vscode-dvc/demo/.env/lib/python3.9/site-packages/scmrepo/git/backend/dulwich/__init__.py", line 748, in diff
    raise RevError("Invalid revision") from exc
scmrepo.exceptions.RevError: Invalid revision
------------------------------------------------------------
2022-09-16 10:23:52,444 DEBUG: Removing '/Users/mattseddon/projects/vscode-dvc/.UUrYrHCXCoi7th3ronGrot.tmp'
2022-09-16 10:23:52,444 DEBUG: Removing '/Users/mattseddon/projects/vscode-dvc/.UUrYrHCXCoi7th3ronGrot.tmp'
2022-09-16 10:23:52,445 DEBUG: Removing '/Users/mattseddon/projects/vscode-dvc/.UUrYrHCXCoi7th3ronGrot.tmp'
2022-09-16 10:23:52,445 DEBUG: Removing '/Users/mattseddon/projects/vscode-dvc/demo/.dvc/cache/.Smc3Bw3TUaCcMUU6quBhPH.tmp'
2022-09-16 10:23:52,445 DEBUG: Version info for developers:
DVC version: 1.0.2.dev2348+gb4beb4e8 
---------------------------------
Platform: Python 3.9.13 on macOS-12.6-arm64-arm-64bit
Subprojects:
        dvc_data = 0.7.1
        dvc_objects = 0.2.2
        dvc_render = 0.0.10
        dvc_task = 0.1.2
        dvclive = 0.10.0
        scmrepo = 0.1.1
Supports:
        http (aiohttp = 3.8.1, aiohttp-retry = 2.8.3),
        https (aiohttp = 3.8.1, aiohttp-retry = 2.8.3),
        s3 (s3fs = 2022.8.2, boto3 = 1.24.59)
Cache types: reflink, hardlink, symlink
Cache directory: apfs on /dev/disk3s1s1
Caches: local
Remotes: s3
Workspace directory: apfs on /dev/disk3s1s1
Repo: dvc (subdir), git

Having any troubles? Hit us up at https://dvc.org/support, we are always happy to help!
2022-09-16 10:23:52,446 DEBUG: Analytics is enabled.
2022-09-16 10:23:52,482 DEBUG: Trying to spawn '['daemon', '-q', 'analytics', '/var/folders/sb/fqcw44jd19nfrhl_9lz_81d80000gn/T/tmpyxrdv_qe']'
2022-09-16 10:23:52,484 DEBUG: Spawned '['daemon', '-q', 'analytics', '/var/folders/sb/fqcw44jd19nfrhl_9lz_81d80000gn/T/tmpyxrdv_qe']'

mattseddon avatar Sep 16 '22 00:09 mattseddon

@mattseddon

The previous problem was because the collect result progress failed in the git pull operations ( because of duplicated experiment names), and make the final result collection failed. Now I had moved all ending dump operations to the cleanup function, in which can guarantee them to be run in a finally scope.


Could you please try it again,(I guess your local repo might be polluted in the previous test, and might need to clean the result manually, building a completely new workspace might help, but the previous error was caused in a dirty env, the newly built one might not trigger the previous error)?

karajan1001 avatar Sep 19 '22 01:09 karajan1001

Could you please try it again

I will test today.

mattseddon avatar Sep 19 '22 23:09 mattseddon

@karajan1001 I'm still seeing the same behaviour. Even with a fresh clone of https://github.com/iterative/vscode-dvc:

dvc exp show --show-json -v
2022-09-20 15:36:25,649 ERROR: unexpected error - Invalid revision: b'1dbb61e08d6d96c6db910a6d392cb1a1bdb9d04d'
------------------------------------------------------------
Traceback (most recent call last):
  File "/Users/mattseddon/projects/vc-nc1/demo/.env/lib/python3.9/site-packages/scmrepo/git/backend/dulwich/__init__.py", line 652, in fetch_refspecs
    check_diverged(
  File "/Users/mattseddon/projects/vc-nc1/demo/.env/lib/python3.9/site-packages/dulwich/porcelain.py", line 347, in check_diverged
    raise DivergedBranches(current_sha, new_sha)
dulwich.porcelain.DivergedBranches: b'2e1b8fbd00600a7457fb91fa14fa7d248a73913b'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/Users/mattseddon/projects/vc-nc1/demo/.env/lib/python3.9/site-packages/scmrepo/git/backend/dulwich/__init__.py", line 746, in diff
    commit_b = self.repo[os.fsencode(rev_b)]
  File "/Users/mattseddon/projects/vc-nc1/demo/.env/lib/python3.9/site-packages/dulwich/repo.py", line 787, in __getitem__
    return self.object_store[self.refs[name]]
  File "/Users/mattseddon/projects/vc-nc1/demo/.env/lib/python3.9/site-packages/dulwich/refs.py", line 320, in __getitem__
    raise KeyError(name)
KeyError: b'1dbb61e08d6d96c6db910a6d392cb1a1bdb9d04d'

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/Users/mattseddon/projects/vc-nc1/demo/.env/lib/python3.9/site-packages/dvc/cli/__init__.py", line 185, in main
    ret = cmd.do_run()
  File "/Users/mattseddon/projects/vc-nc1/demo/.env/lib/python3.9/site-packages/dvc/cli/command.py", line 22, in do_run
    return self.run()
  File "/Users/mattseddon/projects/vc-nc1/demo/.env/lib/python3.9/site-packages/dvc/commands/experiments/show.py", line 475, in run
    all_experiments = self.repo.experiments.show(
  File "/Users/mattseddon/projects/vc-nc1/demo/.env/lib/python3.9/site-packages/dvc/repo/experiments/__init__.py", line 516, in show
    return show(self.repo, *args, **kwargs)
  File "/Users/mattseddon/projects/vc-nc1/demo/.env/lib/python3.9/site-packages/dvc/repo/experiments/show.py", line 163, in show
    running = repo.experiments.get_running_exps(fetch_refs=fetch_running)
  File "/Users/mattseddon/projects/vc-nc1/demo/.env/lib/python3.9/site-packages/dvc/repo/experiments/__init__.py", line 443, in get_running_exps
    self._fetch_running_exp(
  File "/Users/mattseddon/projects/vc-nc1/demo/.env/lib/python3.9/site-packages/dvc/repo/experiments/__init__.py", line 482, in _fetch_running_exp
    for ref in executor.fetch_exps(
  File "/Users/mattseddon/projects/vc-nc1/demo/.env/lib/python3.9/site-packages/dvc/repo/experiments/executor/base.py", line 363, in fetch_exps
    dest_scm.fetch_refspecs(
  File "/Users/mattseddon/projects/vc-nc1/demo/.env/lib/python3.9/site-packages/scmrepo/git/__init__.py", line 289, in _backend_func
    result = func(*args, **kwargs)
  File "/Users/mattseddon/projects/vc-nc1/demo/.env/lib/python3.9/site-packages/scmrepo/git/backend/dulwich/__init__.py", line 658, in fetch_refspecs
    on_diverged(
  File "/Users/mattseddon/projects/vc-nc1/demo/.env/lib/python3.9/site-packages/dvc/repo/experiments/executor/base.py", line 354, in on_diverged_ref
    self._raise_ref_conflict(
  File "/Users/mattseddon/projects/vc-nc1/demo/.env/lib/python3.9/site-packages/dvc/repo/experiments/executor/base.py", line 739, in _raise_ref_conflict
    if scm.diff(orig_rev, new_rev):
  File "/Users/mattseddon/projects/vc-nc1/demo/.env/lib/python3.9/site-packages/scmrepo/git/__init__.py", line 289, in _backend_func
    result = func(*args, **kwargs)
  File "/Users/mattseddon/projects/vc-nc1/demo/.env/lib/python3.9/site-packages/scmrepo/git/backend/dulwich/__init__.py", line 748, in diff
    raise RevError("Invalid revision") from exc
scmrepo.exceptions.RevError: Invalid revision
------------------------------------------------------------
2022-09-20 15:36:25,725 DEBUG: Removing '/Users/mattseddon/projects/vc-nc1/.RzXRu93vW73iJFXLaqEGG2.tmp'
2022-09-20 15:36:25,725 DEBUG: Removing '/Users/mattseddon/projects/vc-nc1/.RzXRu93vW73iJFXLaqEGG2.tmp'
2022-09-20 15:36:25,725 DEBUG: Removing '/Users/mattseddon/projects/vc-nc1/.RzXRu93vW73iJFXLaqEGG2.tmp'
2022-09-20 15:36:25,725 DEBUG: Removing '/Users/mattseddon/projects/vc-nc1/demo/.dvc/cache/.SR4jZgarDBcjiLMyqZnbZg.tmp'
2022-09-20 15:36:25,726 DEBUG: Version info for developers:
DVC version: 1.0.2.dev2371+g94c458d3 
---------------------------------
Platform: Python 3.9.13 on macOS-12.6-arm64-arm-64bit
Subprojects:
        dvc_data = 0.10.0
        dvc_objects = 0.4.0
        dvc_render = 0.0.11
        dvc_task = 0.1.2
        dvclive = 0.10.0
        scmrepo = 0.1.1
Supports:
        http (aiohttp = 3.8.1, aiohttp-retry = 2.8.3),
        https (aiohttp = 3.8.1, aiohttp-retry = 2.8.3)
Cache types: reflink, hardlink, symlink
Cache directory: apfs on /dev/disk3s1s1
Caches: local
Remotes: https
Workspace directory: apfs on /dev/disk3s1s1
Repo: dvc (subdir), git

Having any troubles? Hit us up at https://dvc.org/support, we are always happy to help!
2022-09-20 15:36:25,727 DEBUG: Analytics is enabled.
2022-09-20 15:36:25,758 DEBUG: Trying to spawn '['daemon', '-q', 'analytics', '/var/folders/sb/fqcw44jd19nfrhl_9lz_81d80000gn/T/tmp04tcned9']'
2022-09-20 15:36:25,760 DEBUG: Spawned '['daemon', '-q', 'analytics', '/var/folders/sb/fqcw44jd19nfrhl_9lz_81d80000gn/T/tmp04tcned9']'

mattseddon avatar Sep 20 '22 05:09 mattseddon

@karajan1001 I'm still seeing the same behaviour. Even with a fresh clone of https://github.com/iterative/vscode-dvc:

dvc exp show --show-json -v
2022-09-20 15:36:25,649 ERROR: unexpected error - Invalid revision: b'1dbb61e08d6d96c6db910a6d392cb1a1bdb9d04d'
------------------------------------------------------------
Traceback (most recent call last):
  File "/Users/mattseddon/projects/vc-nc1/demo/.env/lib/python3.9/site-packages/scmrepo/git/backend/dulwich/__init__.py", line 652, in fetch_refspecs
    check_diverged(
  File "/Users/mattseddon/projects/vc-nc1/demo/.env/lib/python3.9/site-packages/dulwich/porcelain.py", line 347, in check_diverged
    raise DivergedBranches(current_sha, new_sha)
dulwich.porcelain.DivergedBranches: b'2e1b8fbd00600a7457fb91fa14fa7d248a73913b'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/Users/mattseddon/projects/vc-nc1/demo/.env/lib/python3.9/site-packages/scmrepo/git/backend/dulwich/__init__.py", line 746, in diff
    commit_b = self.repo[os.fsencode(rev_b)]
  File "/Users/mattseddon/projects/vc-nc1/demo/.env/lib/python3.9/site-packages/dulwich/repo.py", line 787, in __getitem__
    return self.object_store[self.refs[name]]
  File "/Users/mattseddon/projects/vc-nc1/demo/.env/lib/python3.9/site-packages/dulwich/refs.py", line 320, in __getitem__
    raise KeyError(name)
KeyError: b'1dbb61e08d6d96c6db910a6d392cb1a1bdb9d04d'

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/Users/mattseddon/projects/vc-nc1/demo/.env/lib/python3.9/site-packages/dvc/cli/__init__.py", line 185, in main
    ret = cmd.do_run()
  File "/Users/mattseddon/projects/vc-nc1/demo/.env/lib/python3.9/site-packages/dvc/cli/command.py", line 22, in do_run
    return self.run()
  File "/Users/mattseddon/projects/vc-nc1/demo/.env/lib/python3.9/site-packages/dvc/commands/experiments/show.py", line 475, in run
    all_experiments = self.repo.experiments.show(
  File "/Users/mattseddon/projects/vc-nc1/demo/.env/lib/python3.9/site-packages/dvc/repo/experiments/__init__.py", line 516, in show
    return show(self.repo, *args, **kwargs)
  File "/Users/mattseddon/projects/vc-nc1/demo/.env/lib/python3.9/site-packages/dvc/repo/experiments/show.py", line 163, in show
    running = repo.experiments.get_running_exps(fetch_refs=fetch_running)
  File "/Users/mattseddon/projects/vc-nc1/demo/.env/lib/python3.9/site-packages/dvc/repo/experiments/__init__.py", line 443, in get_running_exps
    self._fetch_running_exp(
  File "/Users/mattseddon/projects/vc-nc1/demo/.env/lib/python3.9/site-packages/dvc/repo/experiments/__init__.py", line 482, in _fetch_running_exp
    for ref in executor.fetch_exps(
  File "/Users/mattseddon/projects/vc-nc1/demo/.env/lib/python3.9/site-packages/dvc/repo/experiments/executor/base.py", line 363, in fetch_exps
    dest_scm.fetch_refspecs(
  File "/Users/mattseddon/projects/vc-nc1/demo/.env/lib/python3.9/site-packages/scmrepo/git/__init__.py", line 289, in _backend_func
    result = func(*args, **kwargs)
  File "/Users/mattseddon/projects/vc-nc1/demo/.env/lib/python3.9/site-packages/scmrepo/git/backend/dulwich/__init__.py", line 658, in fetch_refspecs
    on_diverged(
  File "/Users/mattseddon/projects/vc-nc1/demo/.env/lib/python3.9/site-packages/dvc/repo/experiments/executor/base.py", line 354, in on_diverged_ref
    self._raise_ref_conflict(
  File "/Users/mattseddon/projects/vc-nc1/demo/.env/lib/python3.9/site-packages/dvc/repo/experiments/executor/base.py", line 739, in _raise_ref_conflict
    if scm.diff(orig_rev, new_rev):
  File "/Users/mattseddon/projects/vc-nc1/demo/.env/lib/python3.9/site-packages/scmrepo/git/__init__.py", line 289, in _backend_func
    result = func(*args, **kwargs)
  File "/Users/mattseddon/projects/vc-nc1/demo/.env/lib/python3.9/site-packages/scmrepo/git/backend/dulwich/__init__.py", line 748, in diff
    raise RevError("Invalid revision") from exc
scmrepo.exceptions.RevError: Invalid revision
------------------------------------------------------------
2022-09-20 15:36:25,725 DEBUG: Removing '/Users/mattseddon/projects/vc-nc1/.RzXRu93vW73iJFXLaqEGG2.tmp'
2022-09-20 15:36:25,725 DEBUG: Removing '/Users/mattseddon/projects/vc-nc1/.RzXRu93vW73iJFXLaqEGG2.tmp'
2022-09-20 15:36:25,725 DEBUG: Removing '/Users/mattseddon/projects/vc-nc1/.RzXRu93vW73iJFXLaqEGG2.tmp'
2022-09-20 15:36:25,725 DEBUG: Removing '/Users/mattseddon/projects/vc-nc1/demo/.dvc/cache/.SR4jZgarDBcjiLMyqZnbZg.tmp'
2022-09-20 15:36:25,726 DEBUG: Version info for developers:
DVC version: 1.0.2.dev2371+g94c458d3 
---------------------------------
Platform: Python 3.9.13 on macOS-12.6-arm64-arm-64bit
Subprojects:
        dvc_data = 0.10.0
        dvc_objects = 0.4.0
        dvc_render = 0.0.11
        dvc_task = 0.1.2
        dvclive = 0.10.0
        scmrepo = 0.1.1
Supports:
        http (aiohttp = 3.8.1, aiohttp-retry = 2.8.3),
        https (aiohttp = 3.8.1, aiohttp-retry = 2.8.3)
Cache types: reflink, hardlink, symlink
Cache directory: apfs on /dev/disk3s1s1
Caches: local
Remotes: https
Workspace directory: apfs on /dev/disk3s1s1
Repo: dvc (subdir), git

Having any troubles? Hit us up at https://dvc.org/support, we are always happy to help!
2022-09-20 15:36:25,727 DEBUG: Analytics is enabled.
2022-09-20 15:36:25,758 DEBUG: Trying to spawn '['daemon', '-q', 'analytics', '/var/folders/sb/fqcw44jd19nfrhl_9lz_81d80000gn/T/tmp04tcned9']'
2022-09-20 15:36:25,760 DEBUG: Spawned '['daemon', '-q', 'analytics', '/var/folders/sb/fqcw44jd19nfrhl_9lz_81d80000gn/T/tmp04tcned9']'

Sorry I can not reproduce this error.

asciicast

And I had another question, does this PR (changed the format of JSON output of exp show will affect the current version of the vs-code extension?).

BTW, I found that the current dvc exp show runs really slow, and with the help of Cprofile we can see that the three major time cost parts in dvc exp show are the scm revision description, the stages collection and the table rendering.

image

karajan1001 avatar Sep 21 '22 07:09 karajan1001

Sorry I can not reproduce this error.

Can you try with watch -n0 dvc exp show --show-json? I can reproduce with just two terminals:

https://user-images.githubusercontent.com/37993418/191486910-60410f59-31ff-453f-9a94-53c107e26821.mov

https://user-images.githubusercontent.com/37993418/191488158-0312862c-4fac-46aa-a422-bede75f9e7e6.mov

mattseddon avatar Sep 21 '22 11:09 mattseddon

Sorry I can not reproduce this error.

Can you try with watch -n0 dvc exp show --show-json? I can reproduce with just two terminals:

Screen.Recording.2022-09-21.at.8.54.12.pm.mov Screen.Recording.2022-09-21.at.9.01.22.pm.mov

@mattseddon Now I understand, in previous, I believe the error is a lasting one, but with the watch command I can see that the error occurs in an intermediate state. Tested on my local computer, I found that the bugs exists before this PR, and the PR solved the status out-of-sync problem of the exp show

asciicast

we should open some other issues for the problems during the exp show. What I currently found includes

  1. invalid ref during the data collection.
  2. tasks status turned to queued for 1 second before turned into success.
  3. invalid ref during exp remove

karajan1001 avatar Sep 22 '22 06:09 karajan1001

@mattseddon Are these issues blockers for you? If you are getting intermittent errors, is it possible to ignore those?

dberenbaum avatar Sep 22 '22 13:09 dberenbaum

@mattseddon Are these issues blockers for you? If you are getting intermittent errors, is it possible to ignore those?

This is a blocker. We cannot reliably ignore these errors as we cannot distinguish them from any other error type.

mattseddon avatar Sep 22 '22 23:09 mattseddon

@mattseddon Are these issues blockers for you? If you are getting intermittent errors, is it possible to ignore those?

This is a blocker. We cannot reliably ignore these errors as we cannot distinguish them from any other error type.

Thanks @mattseddon.

A few follow up questions:

  1. Should it block the current PR? I see the VS Code table disappearing for a bit both before and after this PR. After this PR, I at least see much quicker updates to the table when the queue is started. Do you see the same? Are any of these errors new to this PR? I'm wondering if we can merge and work on the issues mentioned by @karajan1001 as follow ups.
  2. Will the table disappear anytime there is an error returned by exp show? It seems like a strong assumption for a command that is constantly running in the background. For example, why not raise an error dialog but keep the last version of the table visible until the error is resolved? Or wait some number of iterations/amount of time before showing the error?

dberenbaum avatar Sep 23 '22 13:09 dberenbaum

  1. Should it block the current PR? I see the VS Code table disappearing for a bit both before and after this PR. After this PR, I at least see much quicker updates to the table when the queue is started. Do you see the same? Are any of these errors new to this PR? I'm wondering if we can merge and work on the issues mentioned by @karajan1001 as follow ups.

Doesn't need to block this PR.

  1. Will the table disappear anytime there is an error returned by exp show? It seems like a strong assumption for a command that is constantly running in the background. For example, why not raise an error dialog but keep the last version of the table visible until the error is resolved? Or wait some number of iterations/amount of time before showing the error?

Yes, it will disappear for any error. I would like to move away from the papering over the cracks approach that we have taken up until now.

mattseddon avatar Sep 25 '22 22:09 mattseddon

Yes, it will disappear for any error. I would like to move away from the papering over the cracks approach that we have taken up until now.

Agreed, but I'd consider these separate issues. Regardless of how stable the commands become, it still seems severe to me to have the table disappear in case an unknown error ever occurs. I would almost always prefer it to be stale than have it disappear. Is there a reason to dropping the table is considered preferable?

dberenbaum avatar Sep 26 '22 19:09 dberenbaum

I gathered some of the other problems during my experience using exp show

  1. exp show slow in a repo with a large number of checkpoints, ( looks like related to the collection of every single checkpoint)
  2. The Initialization of a temp workspace was slow (in Matt's demo it usually takes about half a minute on my computer).
  3. During the Initialization above we Can't kill the queue tasks, because no info file during this progress.

karajan1001 avatar Sep 27 '22 03:09 karajan1001