vscode-dvc icon indicating copy to clipboard operation
vscode-dvc copied to clipboard

Use new data status command

Open mattseddon opened this issue 3 years ago • 6 comments

1/2 main <- this <- #2151

This PR integrates our DVC Tracked tree, SCM View and File Decorations with the new dvc data status command. That means we no longer have to call dvc status, dvc list & dvc diff 🎉. We also don't need the help of the exp show data to work out which paths we can feed directly into dvc pull without causing errors.

I have found a few issues with the way that the new command behaves and I have raised issues in the DVC repository:

  1. https://github.com/iterative/dvc/issues/8052
  2. https://github.com/iterative/dvc/issues/8053
  3. https://github.com/iterative/dvc/issues/8062

~None of the above issues are blockers as I have been able to mostly patch over the problems.~ The blocking issue that I cannot fix is not all paths are returned when there is no cache (part of https://github.com/iterative/dvc/issues/8062).

Other Blockers:

We need https://github.com/iterative/dvc/issues/8051 to be fixed and released by DVC before we can safely increment our minimum version of the CLI. That needs to happen before we merge and release this PR.

Note: There are some other fixes in DVC's main branch that we also need to be made available before we can safely increment our minimum version.

Demos:

Not in cache

https://user-images.githubusercontent.com/37993418/181851488-8e8d4961-8c71-442e-ac0f-8d37c6ff0216.mov

Add Data

https://user-images.githubusercontent.com/37993418/181852487-435ca9a6-7855-41aa-9693-847790ce2473.mov

Commit modified data

https://user-images.githubusercontent.com/37993418/181853496-d513f445-52a4-4cf2-985a-3f8c9e66cee8.mov

mattseddon avatar Jul 25 '22 09:07 mattseddon

@mattseddon, is it possible to know how much this improved/regressed the performance of the extension (from the command invocation to completely rendering in the filetree)?

We are tracking the VSCode's usecase in https://bench.dvc.org, but I'd love to see e2e results:

Name (time in s)                                          Min                Max               Mean            StdDev             Median               IQR            Outliers     OPS            Rounds  Iterations
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
test_data_status-data-changed[main-all_flags]        30.5379 (3.70)     30.5379 (3.70)     30.5379 (3.70)     0.0000 (1.0)      30.5379 (3.70)     0.0000 (1.0)           0;0  0.0327 (0.27)          1           1
test_data_status-data-changed-noop[main-all_flags]   31.1806 (3.76)     31.1806 (3.76)     31.1806 (3.76)     0.0000 (1.0)      31.1806 (3.76)     0.0000 (1.0)           0;0  0.0321 (0.27)          1           1
test_data_status-data-new[main-all_flags]            59.5360 (1.54)     59.5360 (1.54)     59.5360 (1.54)     0.0000 (1.0)      59.5360 (1.54)     0.0000 (1.0)           0;0  0.0168 (0.65)          1           1
test_data_status-data-noop[main-all_flags]           28.5594 (3.64)     28.5594 (3.64)     28.5594 (3.64)     0.0000 (1.0)      28.5594 (3.64)     0.0000 (1.0)           0;0  0.0350 (0.27)          1           1
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

skshetry avatar Jul 27 '22 08:07 skshetry

@mattseddon, is it possible to know how much this improved/regressed the performance of the extension (from the command invocation to completely rendering in the filetree)?

We are tracking the VSCode's usecase in https://bench.dvc.org, but I'd love to see e2e results:

Name (time in s)                                          Min                Max               Mean            StdDev             Median               IQR            Outliers     OPS            Rounds  Iterations
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
test_data_status-data-changed[main-all_flags]        30.5379 (3.70)     30.5379 (3.70)     30.5379 (3.70)     0.0000 (1.0)      30.5379 (3.70)     0.0000 (1.0)           0;0  0.0327 (0.27)          1           1
test_data_status-data-changed-noop[main-all_flags]   31.1806 (3.76)     31.1806 (3.76)     31.1806 (3.76)     0.0000 (1.0)      31.1806 (3.76)     0.0000 (1.0)           0;0  0.0321 (0.27)          1           1
test_data_status-data-new[main-all_flags]            59.5360 (1.54)     59.5360 (1.54)     59.5360 (1.54)     0.0000 (1.0)      59.5360 (1.54)     0.0000 (1.0)           0;0  0.0168 (0.65)          1           1
test_data_status-data-noop[main-all_flags]           28.5594 (3.64)     28.5594 (3.64)     28.5594 (3.64)     0.0000 (1.0)      28.5594 (3.64)     0.0000 (1.0)           0;0  0.0350 (0.27)          1           1
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

These are rough numbers (and for a small repo) but the amount of time to first render the DVC Tracked tree on startup for example-get-started has gone from ~4700ms => ~1600ms on my machine. There is some lag time in there for the rest of the extension to startup but it will be common across both observations. The performance improvement is noticeable under general usage 🙏🏻.

mattseddon avatar Jul 27 '22 09:07 mattseddon

I don't know if it's necessarily related to this PR, but I'm able to consistently get the demo DVC repo in a state where dvc exp show itself thinks an experiment is running, but no experiment is and I must rm -rf .dvc to get experiments usable again. There's probably a more specific file/folder in .dvc that is the problem, but I tried each individual lockfile and didn't get results.

https://user-images.githubusercontent.com/9111807/182227747-b3457f7c-fac8-4df1-85e8-d146a9e62641.mp4

rogermparent avatar Aug 01 '22 19:08 rogermparent

I don't know if it's necessarily related to this PR, but I'm able to consistently get the demo DVC repo in a state where dvc exp show itself thinks an experiment is running, but no experiment is and I must rm -rf .dvc to get experiments usable again. There's probably a more specific file/folder in .dvc that is the problem, but I tried each individual lockfile and didn't get results.

experiments-cancel-stuck-on-running-bug-demo.mp4

@rogermparent sounds like https://github.com/iterative/dvc/issues/8045. Try using main.

mattseddon avatar Aug 01 '22 19:08 mattseddon

main did indeed solve my issue!

rogermparent avatar Aug 01 '22 23:08 rogermparent

Code Climate has analyzed commit 5f88d9ff and detected 0 issues on this pull request.

The test coverage on the diff in this pull request is 100.0% (85% is the threshold).

This pull request will bring the total coverage in the repository to 96.7% (0.0% change).

View more on Code Climate.

qlty-cloud-legacy[bot] avatar Aug 31 '22 22:08 qlty-cloud-legacy[bot]