exp show: sync state between queue and exp show table
fix: #8088
- Refactor seperate the initialization of executor and setup environment
- Move ref setup into
executor.init_git - Add a new attribute status to ExecutorInfo file
- Update running status to the executor infofile.
- Use task status to replace collected.
- Move some basic test script from function tests to unit test.
- Add success/failed tests for the status change of
tempdir,celery,workspacerunning case.
-
[x] ❗ I have followed the Contributing to DVC checklist.
-
[ ] 📖 If this PR requires documentation updates, I have created a separate PR (or issue, at least) in dvc.org and linked it here.
Thank you for the contribution - we'll try to review it as soon as possible. 🙏
Looks like need to clean the manager or the pylint will fail.
Excuse me, @mattseddon, Could you please help me to verify if the problem is solved after this PR on the VSCode extension? Thank you.
Excuse me, @mattseddon, Could you please help me to verify if the problem is solved after this PR on the VSCode extension? Thank you.
I will test 👍🏻.
This is the experience in the extension using dvc queue start -j 1 with 3 queued experiments:
https://user-images.githubusercontent.com/37993418/190292424-3ee67d58-7405-4ec8-9567-b974333c31d9.mov
It is an issue that dvc exp show can run into unexpected errors that look like this:
ERROR: unexpected error - Invalid revision: b'02712dc464ab868043e7eefc335a8d5fd39ab6f7'
Question: Is running in exp show synced with TaskStatus.PREPARING in the executor or is there another field in the output that I should be looking for?
Note: Probably unrelated to this change but I started by trying to run with -j 3 and saw some very weird behaviour.
I went through the following steps:
- installed
git+https://github.com/karajan1001/dvc.git@fix8088into a demo project's virtual environment dvc exp run --queuex 3 with different params for eachdvc queue start -j 3- One experiment succeeded and two failed.
After those failures, any attempt to queue an experiment would result in the experiment being run straight away. Even though queue status stated that there were no active workers:
~/demo main !2 ?1 ❯ dvc queue status
Task Name Created Status
e7bf66b 10:35 AM Failed
ffc912f 10:35 AM Failed
08a047c exp-92e70 10:35 AM Success
6e259ab exp-684d6 10:41 AM Success
aad7cf0 exp-14b96 10:40 AM Success
Worker status: 0 active, 0 idle
~/demo main !2 ?1 ❯ dvc exp run --queue
Queued experiment '2f7751d' for future execution.
~/demo main !2 ?1 ❯ dvc queue status
Task Name Created Status
2f7751d 10:42 AM Running
e7bf66b 10:35 AM Failed
ffc912f 10:35 AM Failed
08a047c exp-92e70 10:35 AM Success
6e259ab exp-684d6 10:41 AM Success
aad7cf0 exp-14b96 10:40 AM Success
Worker status: 0 active, 0 idle
The only way that I could get the repo out of this state was to delete .dvc/tmp/exps. dvc queue stop & dvc queue kill had no impact.
Note: Probably unrelated to this change but I started by trying to run with -j 3 and saw some very weird behaviour.
I went through the following steps:
installed git+https://github.com/karajan1001/dvc.git@fix8088 into a demo project's virtual environment dvc exp run --queue x 3 with different params for each dvc queue start -j 3 One experiment succeeded and two failed. After those failures, any attempt to queue an experiment would result in the experiment being run straight away. Even > though queue status stated that there were no active workers:
For the job count 3, you need to test it after https://github.com/iterative/dvc-task/pull/90 merged.
Question: Is running in exp show synced with TaskStatus.PREPARING in the executor or is there another field in the output that I should be looking for?
exp show reads the TaskStatus of each exps but not only depends on them, because the TaskStatus will only be generated after the exp begins to run.
Verbose log for error:
~/projects/vscode-dvc/demo main *1 !4 ?1 ❯ dvc exp show --show-json -v ✘ 252 18s .env base 10:23:44
2022-09-16 10:23:52,332 ERROR: unexpected error - Invalid revision: b'd0b057085e1e96b3406f78cd9cb2decfb86976b3'
------------------------------------------------------------
Traceback (most recent call last):
File "/Users/mattseddon/projects/vscode-dvc/demo/.env/lib/python3.9/site-packages/scmrepo/git/backend/dulwich/__init__.py", line 652, in fetch_refspecs
check_diverged(
File "/Users/mattseddon/projects/vscode-dvc/demo/.env/lib/python3.9/site-packages/dulwich/porcelain.py", line 347, in check_diverged
raise DivergedBranches(current_sha, new_sha)
dulwich.porcelain.DivergedBranches: b'd0b057085e1e96b3406f78cd9cb2decfb86976b3'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/Users/mattseddon/projects/vscode-dvc/demo/.env/lib/python3.9/site-packages/scmrepo/git/backend/dulwich/__init__.py", line 745, in diff
commit_a = self.repo[os.fsencode(rev_a)]
File "/Users/mattseddon/projects/vscode-dvc/demo/.env/lib/python3.9/site-packages/dulwich/repo.py", line 787, in __getitem__
return self.object_store[self.refs[name]]
File "/Users/mattseddon/projects/vscode-dvc/demo/.env/lib/python3.9/site-packages/dulwich/refs.py", line 320, in __getitem__
raise KeyError(name)
KeyError: b'd0b057085e1e96b3406f78cd9cb2decfb86976b3'
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/Users/mattseddon/projects/vscode-dvc/demo/.env/lib/python3.9/site-packages/dvc/cli/__init__.py", line 185, in main
ret = cmd.do_run()
File "/Users/mattseddon/projects/vscode-dvc/demo/.env/lib/python3.9/site-packages/dvc/cli/command.py", line 22, in do_run
return self.run()
File "/Users/mattseddon/projects/vscode-dvc/demo/.env/lib/python3.9/site-packages/dvc/commands/experiments/show.py", line 475, in run
all_experiments = self.repo.experiments.show(
File "/Users/mattseddon/projects/vscode-dvc/demo/.env/lib/python3.9/site-packages/dvc/repo/experiments/__init__.py", line 516, in show
return show(self.repo, *args, **kwargs)
File "/Users/mattseddon/projects/vscode-dvc/demo/.env/lib/python3.9/site-packages/dvc/repo/experiments/show.py", line 163, in show
running = repo.experiments.get_running_exps(fetch_refs=fetch_running)
File "/Users/mattseddon/projects/vscode-dvc/demo/.env/lib/python3.9/site-packages/dvc/repo/experiments/__init__.py", line 443, in get_running_exps
self._fetch_running_exp(
File "/Users/mattseddon/projects/vscode-dvc/demo/.env/lib/python3.9/site-packages/dvc/repo/experiments/__init__.py", line 482, in _fetch_running_exp
for ref in executor.fetch_exps(
File "/Users/mattseddon/projects/vscode-dvc/demo/.env/lib/python3.9/site-packages/dvc/repo/experiments/executor/base.py", line 358, in fetch_exps
dest_scm.fetch_refspecs(
File "/Users/mattseddon/projects/vscode-dvc/demo/.env/lib/python3.9/site-packages/scmrepo/git/__init__.py", line 289, in _backend_func
result = func(*args, **kwargs)
File "/Users/mattseddon/projects/vscode-dvc/demo/.env/lib/python3.9/site-packages/scmrepo/git/backend/dulwich/__init__.py", line 658, in fetch_refspecs
on_diverged(
File "/Users/mattseddon/projects/vscode-dvc/demo/.env/lib/python3.9/site-packages/dvc/repo/experiments/executor/base.py", line 349, in on_diverged_ref
self._raise_ref_conflict(
File "/Users/mattseddon/projects/vscode-dvc/demo/.env/lib/python3.9/site-packages/dvc/repo/experiments/executor/base.py", line 734, in _raise_ref_conflict
if scm.diff(orig_rev, new_rev):
File "/Users/mattseddon/projects/vscode-dvc/demo/.env/lib/python3.9/site-packages/scmrepo/git/__init__.py", line 289, in _backend_func
result = func(*args, **kwargs)
File "/Users/mattseddon/projects/vscode-dvc/demo/.env/lib/python3.9/site-packages/scmrepo/git/backend/dulwich/__init__.py", line 748, in diff
raise RevError("Invalid revision") from exc
scmrepo.exceptions.RevError: Invalid revision
------------------------------------------------------------
2022-09-16 10:23:52,444 DEBUG: Removing '/Users/mattseddon/projects/vscode-dvc/.UUrYrHCXCoi7th3ronGrot.tmp'
2022-09-16 10:23:52,444 DEBUG: Removing '/Users/mattseddon/projects/vscode-dvc/.UUrYrHCXCoi7th3ronGrot.tmp'
2022-09-16 10:23:52,445 DEBUG: Removing '/Users/mattseddon/projects/vscode-dvc/.UUrYrHCXCoi7th3ronGrot.tmp'
2022-09-16 10:23:52,445 DEBUG: Removing '/Users/mattseddon/projects/vscode-dvc/demo/.dvc/cache/.Smc3Bw3TUaCcMUU6quBhPH.tmp'
2022-09-16 10:23:52,445 DEBUG: Version info for developers:
DVC version: 1.0.2.dev2348+gb4beb4e8
---------------------------------
Platform: Python 3.9.13 on macOS-12.6-arm64-arm-64bit
Subprojects:
dvc_data = 0.7.1
dvc_objects = 0.2.2
dvc_render = 0.0.10
dvc_task = 0.1.2
dvclive = 0.10.0
scmrepo = 0.1.1
Supports:
http (aiohttp = 3.8.1, aiohttp-retry = 2.8.3),
https (aiohttp = 3.8.1, aiohttp-retry = 2.8.3),
s3 (s3fs = 2022.8.2, boto3 = 1.24.59)
Cache types: reflink, hardlink, symlink
Cache directory: apfs on /dev/disk3s1s1
Caches: local
Remotes: s3
Workspace directory: apfs on /dev/disk3s1s1
Repo: dvc (subdir), git
Having any troubles? Hit us up at https://dvc.org/support, we are always happy to help!
2022-09-16 10:23:52,446 DEBUG: Analytics is enabled.
2022-09-16 10:23:52,482 DEBUG: Trying to spawn '['daemon', '-q', 'analytics', '/var/folders/sb/fqcw44jd19nfrhl_9lz_81d80000gn/T/tmpyxrdv_qe']'
2022-09-16 10:23:52,484 DEBUG: Spawned '['daemon', '-q', 'analytics', '/var/folders/sb/fqcw44jd19nfrhl_9lz_81d80000gn/T/tmpyxrdv_qe']'
@mattseddon
The previous problem was because the collect result progress failed in the git pull operations ( because of duplicated experiment names), and make the final result collection failed. Now I had moved all ending dump operations to the cleanup function, in which can guarantee them to be run in a finally scope.
Could you please try it again,(I guess your local repo might be polluted in the previous test, and might need to clean the result manually, building a completely new workspace might help, but the previous error was caused in a dirty env, the newly built one might not trigger the previous error)?
Could you please try it again
I will test today.
@karajan1001 I'm still seeing the same behaviour. Even with a fresh clone of https://github.com/iterative/vscode-dvc:
dvc exp show --show-json -v
2022-09-20 15:36:25,649 ERROR: unexpected error - Invalid revision: b'1dbb61e08d6d96c6db910a6d392cb1a1bdb9d04d'
------------------------------------------------------------
Traceback (most recent call last):
File "/Users/mattseddon/projects/vc-nc1/demo/.env/lib/python3.9/site-packages/scmrepo/git/backend/dulwich/__init__.py", line 652, in fetch_refspecs
check_diverged(
File "/Users/mattseddon/projects/vc-nc1/demo/.env/lib/python3.9/site-packages/dulwich/porcelain.py", line 347, in check_diverged
raise DivergedBranches(current_sha, new_sha)
dulwich.porcelain.DivergedBranches: b'2e1b8fbd00600a7457fb91fa14fa7d248a73913b'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/Users/mattseddon/projects/vc-nc1/demo/.env/lib/python3.9/site-packages/scmrepo/git/backend/dulwich/__init__.py", line 746, in diff
commit_b = self.repo[os.fsencode(rev_b)]
File "/Users/mattseddon/projects/vc-nc1/demo/.env/lib/python3.9/site-packages/dulwich/repo.py", line 787, in __getitem__
return self.object_store[self.refs[name]]
File "/Users/mattseddon/projects/vc-nc1/demo/.env/lib/python3.9/site-packages/dulwich/refs.py", line 320, in __getitem__
raise KeyError(name)
KeyError: b'1dbb61e08d6d96c6db910a6d392cb1a1bdb9d04d'
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/Users/mattseddon/projects/vc-nc1/demo/.env/lib/python3.9/site-packages/dvc/cli/__init__.py", line 185, in main
ret = cmd.do_run()
File "/Users/mattseddon/projects/vc-nc1/demo/.env/lib/python3.9/site-packages/dvc/cli/command.py", line 22, in do_run
return self.run()
File "/Users/mattseddon/projects/vc-nc1/demo/.env/lib/python3.9/site-packages/dvc/commands/experiments/show.py", line 475, in run
all_experiments = self.repo.experiments.show(
File "/Users/mattseddon/projects/vc-nc1/demo/.env/lib/python3.9/site-packages/dvc/repo/experiments/__init__.py", line 516, in show
return show(self.repo, *args, **kwargs)
File "/Users/mattseddon/projects/vc-nc1/demo/.env/lib/python3.9/site-packages/dvc/repo/experiments/show.py", line 163, in show
running = repo.experiments.get_running_exps(fetch_refs=fetch_running)
File "/Users/mattseddon/projects/vc-nc1/demo/.env/lib/python3.9/site-packages/dvc/repo/experiments/__init__.py", line 443, in get_running_exps
self._fetch_running_exp(
File "/Users/mattseddon/projects/vc-nc1/demo/.env/lib/python3.9/site-packages/dvc/repo/experiments/__init__.py", line 482, in _fetch_running_exp
for ref in executor.fetch_exps(
File "/Users/mattseddon/projects/vc-nc1/demo/.env/lib/python3.9/site-packages/dvc/repo/experiments/executor/base.py", line 363, in fetch_exps
dest_scm.fetch_refspecs(
File "/Users/mattseddon/projects/vc-nc1/demo/.env/lib/python3.9/site-packages/scmrepo/git/__init__.py", line 289, in _backend_func
result = func(*args, **kwargs)
File "/Users/mattseddon/projects/vc-nc1/demo/.env/lib/python3.9/site-packages/scmrepo/git/backend/dulwich/__init__.py", line 658, in fetch_refspecs
on_diverged(
File "/Users/mattseddon/projects/vc-nc1/demo/.env/lib/python3.9/site-packages/dvc/repo/experiments/executor/base.py", line 354, in on_diverged_ref
self._raise_ref_conflict(
File "/Users/mattseddon/projects/vc-nc1/demo/.env/lib/python3.9/site-packages/dvc/repo/experiments/executor/base.py", line 739, in _raise_ref_conflict
if scm.diff(orig_rev, new_rev):
File "/Users/mattseddon/projects/vc-nc1/demo/.env/lib/python3.9/site-packages/scmrepo/git/__init__.py", line 289, in _backend_func
result = func(*args, **kwargs)
File "/Users/mattseddon/projects/vc-nc1/demo/.env/lib/python3.9/site-packages/scmrepo/git/backend/dulwich/__init__.py", line 748, in diff
raise RevError("Invalid revision") from exc
scmrepo.exceptions.RevError: Invalid revision
------------------------------------------------------------
2022-09-20 15:36:25,725 DEBUG: Removing '/Users/mattseddon/projects/vc-nc1/.RzXRu93vW73iJFXLaqEGG2.tmp'
2022-09-20 15:36:25,725 DEBUG: Removing '/Users/mattseddon/projects/vc-nc1/.RzXRu93vW73iJFXLaqEGG2.tmp'
2022-09-20 15:36:25,725 DEBUG: Removing '/Users/mattseddon/projects/vc-nc1/.RzXRu93vW73iJFXLaqEGG2.tmp'
2022-09-20 15:36:25,725 DEBUG: Removing '/Users/mattseddon/projects/vc-nc1/demo/.dvc/cache/.SR4jZgarDBcjiLMyqZnbZg.tmp'
2022-09-20 15:36:25,726 DEBUG: Version info for developers:
DVC version: 1.0.2.dev2371+g94c458d3
---------------------------------
Platform: Python 3.9.13 on macOS-12.6-arm64-arm-64bit
Subprojects:
dvc_data = 0.10.0
dvc_objects = 0.4.0
dvc_render = 0.0.11
dvc_task = 0.1.2
dvclive = 0.10.0
scmrepo = 0.1.1
Supports:
http (aiohttp = 3.8.1, aiohttp-retry = 2.8.3),
https (aiohttp = 3.8.1, aiohttp-retry = 2.8.3)
Cache types: reflink, hardlink, symlink
Cache directory: apfs on /dev/disk3s1s1
Caches: local
Remotes: https
Workspace directory: apfs on /dev/disk3s1s1
Repo: dvc (subdir), git
Having any troubles? Hit us up at https://dvc.org/support, we are always happy to help!
2022-09-20 15:36:25,727 DEBUG: Analytics is enabled.
2022-09-20 15:36:25,758 DEBUG: Trying to spawn '['daemon', '-q', 'analytics', '/var/folders/sb/fqcw44jd19nfrhl_9lz_81d80000gn/T/tmp04tcned9']'
2022-09-20 15:36:25,760 DEBUG: Spawned '['daemon', '-q', 'analytics', '/var/folders/sb/fqcw44jd19nfrhl_9lz_81d80000gn/T/tmp04tcned9']'
@karajan1001 I'm still seeing the same behaviour. Even with a fresh clone of https://github.com/iterative/vscode-dvc:
dvc exp show --show-json -v 2022-09-20 15:36:25,649 ERROR: unexpected error - Invalid revision: b'1dbb61e08d6d96c6db910a6d392cb1a1bdb9d04d' ------------------------------------------------------------ Traceback (most recent call last): File "/Users/mattseddon/projects/vc-nc1/demo/.env/lib/python3.9/site-packages/scmrepo/git/backend/dulwich/__init__.py", line 652, in fetch_refspecs check_diverged( File "/Users/mattseddon/projects/vc-nc1/demo/.env/lib/python3.9/site-packages/dulwich/porcelain.py", line 347, in check_diverged raise DivergedBranches(current_sha, new_sha) dulwich.porcelain.DivergedBranches: b'2e1b8fbd00600a7457fb91fa14fa7d248a73913b' During handling of the above exception, another exception occurred: Traceback (most recent call last): File "/Users/mattseddon/projects/vc-nc1/demo/.env/lib/python3.9/site-packages/scmrepo/git/backend/dulwich/__init__.py", line 746, in diff commit_b = self.repo[os.fsencode(rev_b)] File "/Users/mattseddon/projects/vc-nc1/demo/.env/lib/python3.9/site-packages/dulwich/repo.py", line 787, in __getitem__ return self.object_store[self.refs[name]] File "/Users/mattseddon/projects/vc-nc1/demo/.env/lib/python3.9/site-packages/dulwich/refs.py", line 320, in __getitem__ raise KeyError(name) KeyError: b'1dbb61e08d6d96c6db910a6d392cb1a1bdb9d04d' The above exception was the direct cause of the following exception: Traceback (most recent call last): File "/Users/mattseddon/projects/vc-nc1/demo/.env/lib/python3.9/site-packages/dvc/cli/__init__.py", line 185, in main ret = cmd.do_run() File "/Users/mattseddon/projects/vc-nc1/demo/.env/lib/python3.9/site-packages/dvc/cli/command.py", line 22, in do_run return self.run() File "/Users/mattseddon/projects/vc-nc1/demo/.env/lib/python3.9/site-packages/dvc/commands/experiments/show.py", line 475, in run all_experiments = self.repo.experiments.show( File "/Users/mattseddon/projects/vc-nc1/demo/.env/lib/python3.9/site-packages/dvc/repo/experiments/__init__.py", line 516, in show return show(self.repo, *args, **kwargs) File "/Users/mattseddon/projects/vc-nc1/demo/.env/lib/python3.9/site-packages/dvc/repo/experiments/show.py", line 163, in show running = repo.experiments.get_running_exps(fetch_refs=fetch_running) File "/Users/mattseddon/projects/vc-nc1/demo/.env/lib/python3.9/site-packages/dvc/repo/experiments/__init__.py", line 443, in get_running_exps self._fetch_running_exp( File "/Users/mattseddon/projects/vc-nc1/demo/.env/lib/python3.9/site-packages/dvc/repo/experiments/__init__.py", line 482, in _fetch_running_exp for ref in executor.fetch_exps( File "/Users/mattseddon/projects/vc-nc1/demo/.env/lib/python3.9/site-packages/dvc/repo/experiments/executor/base.py", line 363, in fetch_exps dest_scm.fetch_refspecs( File "/Users/mattseddon/projects/vc-nc1/demo/.env/lib/python3.9/site-packages/scmrepo/git/__init__.py", line 289, in _backend_func result = func(*args, **kwargs) File "/Users/mattseddon/projects/vc-nc1/demo/.env/lib/python3.9/site-packages/scmrepo/git/backend/dulwich/__init__.py", line 658, in fetch_refspecs on_diverged( File "/Users/mattseddon/projects/vc-nc1/demo/.env/lib/python3.9/site-packages/dvc/repo/experiments/executor/base.py", line 354, in on_diverged_ref self._raise_ref_conflict( File "/Users/mattseddon/projects/vc-nc1/demo/.env/lib/python3.9/site-packages/dvc/repo/experiments/executor/base.py", line 739, in _raise_ref_conflict if scm.diff(orig_rev, new_rev): File "/Users/mattseddon/projects/vc-nc1/demo/.env/lib/python3.9/site-packages/scmrepo/git/__init__.py", line 289, in _backend_func result = func(*args, **kwargs) File "/Users/mattseddon/projects/vc-nc1/demo/.env/lib/python3.9/site-packages/scmrepo/git/backend/dulwich/__init__.py", line 748, in diff raise RevError("Invalid revision") from exc scmrepo.exceptions.RevError: Invalid revision ------------------------------------------------------------ 2022-09-20 15:36:25,725 DEBUG: Removing '/Users/mattseddon/projects/vc-nc1/.RzXRu93vW73iJFXLaqEGG2.tmp' 2022-09-20 15:36:25,725 DEBUG: Removing '/Users/mattseddon/projects/vc-nc1/.RzXRu93vW73iJFXLaqEGG2.tmp' 2022-09-20 15:36:25,725 DEBUG: Removing '/Users/mattseddon/projects/vc-nc1/.RzXRu93vW73iJFXLaqEGG2.tmp' 2022-09-20 15:36:25,725 DEBUG: Removing '/Users/mattseddon/projects/vc-nc1/demo/.dvc/cache/.SR4jZgarDBcjiLMyqZnbZg.tmp' 2022-09-20 15:36:25,726 DEBUG: Version info for developers: DVC version: 1.0.2.dev2371+g94c458d3 --------------------------------- Platform: Python 3.9.13 on macOS-12.6-arm64-arm-64bit Subprojects: dvc_data = 0.10.0 dvc_objects = 0.4.0 dvc_render = 0.0.11 dvc_task = 0.1.2 dvclive = 0.10.0 scmrepo = 0.1.1 Supports: http (aiohttp = 3.8.1, aiohttp-retry = 2.8.3), https (aiohttp = 3.8.1, aiohttp-retry = 2.8.3) Cache types: reflink, hardlink, symlink Cache directory: apfs on /dev/disk3s1s1 Caches: local Remotes: https Workspace directory: apfs on /dev/disk3s1s1 Repo: dvc (subdir), git Having any troubles? Hit us up at https://dvc.org/support, we are always happy to help! 2022-09-20 15:36:25,727 DEBUG: Analytics is enabled. 2022-09-20 15:36:25,758 DEBUG: Trying to spawn '['daemon', '-q', 'analytics', '/var/folders/sb/fqcw44jd19nfrhl_9lz_81d80000gn/T/tmp04tcned9']' 2022-09-20 15:36:25,760 DEBUG: Spawned '['daemon', '-q', 'analytics', '/var/folders/sb/fqcw44jd19nfrhl_9lz_81d80000gn/T/tmp04tcned9']'
Sorry I can not reproduce this error.
And I had another question, does this PR (changed the format of JSON output of exp show will affect the current version of the vs-code extension?).
BTW, I found that the current dvc exp show runs really slow, and with the help of Cprofile we can see that the three major time cost parts in dvc exp show are the scm revision description, the stages collection and the table rendering.

Sorry I can not reproduce this error.
Can you try with watch -n0 dvc exp show --show-json? I can reproduce with just two terminals:
https://user-images.githubusercontent.com/37993418/191486910-60410f59-31ff-453f-9a94-53c107e26821.mov
https://user-images.githubusercontent.com/37993418/191488158-0312862c-4fac-46aa-a422-bede75f9e7e6.mov
Sorry I can not reproduce this error.
Can you try with
watch -n0 dvc exp show --show-json? I can reproduce with just two terminals:Screen.Recording.2022-09-21.at.8.54.12.pm.mov Screen.Recording.2022-09-21.at.9.01.22.pm.mov
@mattseddon
Now I understand, in previous, I believe the error is a lasting one, but with the watch command I can see that the error occurs in an intermediate state. Tested on my local computer, I found that the bugs exists before this PR, and the PR solved the status out-of-sync problem of the exp show
we should open some other issues for the problems during the exp show. What I currently found includes
- invalid ref during the data collection.
- tasks status turned to queued for 1 second before turned into success.
- invalid ref during
exp remove
@mattseddon Are these issues blockers for you? If you are getting intermittent errors, is it possible to ignore those?
@mattseddon Are these issues blockers for you? If you are getting intermittent errors, is it possible to ignore those?
This is a blocker. We cannot reliably ignore these errors as we cannot distinguish them from any other error type.
@mattseddon Are these issues blockers for you? If you are getting intermittent errors, is it possible to ignore those?
This is a blocker. We cannot reliably ignore these errors as we cannot distinguish them from any other error type.
Thanks @mattseddon.
A few follow up questions:
- Should it block the current PR? I see the VS Code table disappearing for a bit both before and after this PR. After this PR, I at least see much quicker updates to the table when the queue is started. Do you see the same? Are any of these errors new to this PR? I'm wondering if we can merge and work on the issues mentioned by @karajan1001 as follow ups.
- Will the table disappear anytime there is an error returned by
exp show? It seems like a strong assumption for a command that is constantly running in the background. For example, why not raise an error dialog but keep the last version of the table visible until the error is resolved? Or wait some number of iterations/amount of time before showing the error?
- Should it block the current PR? I see the VS Code table disappearing for a bit both before and after this PR. After this PR, I at least see much quicker updates to the table when the queue is started. Do you see the same? Are any of these errors new to this PR? I'm wondering if we can merge and work on the issues mentioned by @karajan1001 as follow ups.
Doesn't need to block this PR.
- Will the table disappear anytime there is an error returned by
exp show? It seems like a strong assumption for a command that is constantly running in the background. For example, why not raise an error dialog but keep the last version of the table visible until the error is resolved? Or wait some number of iterations/amount of time before showing the error?
Yes, it will disappear for any error. I would like to move away from the papering over the cracks approach that we have taken up until now.
Yes, it will disappear for any error. I would like to move away from the papering over the cracks approach that we have taken up until now.
Agreed, but I'd consider these separate issues. Regardless of how stable the commands become, it still seems severe to me to have the table disappear in case an unknown error ever occurs. I would almost always prefer it to be stale than have it disappear. Is there a reason to dropping the table is considered preferable?
I gathered some of the other problems during my experience using exp show
exp showslow in a repo with a large number of checkpoints, ( looks like related to the collection of every single checkpoint)- The Initialization of a temp workspace was slow (in Matt's demo it usually takes about half a minute on my computer).
- During the Initialization above we Can't kill the queue tasks, because no info file during this progress.