dvc Quick listing of stages

Quick listing of stages

Open mribeirodantas opened this issue 4 years ago • 12 comments

Before last release, it was handy to reproduce my stages due to easy filename completion in the command line, due to the stages being files in the directory. All I had to do was to type dvc repro, start typing the name and hit TAB.

Now, stages are fields in a YAML-formatted single file. If I don't know exactly the stage name, I must open the file, look for the part of the file where the stage name that I am looking for is located at, and check or copy-paste from there. Then leave the file and type.

The ideal feature would be to auto-complete with TAB, just like before, but this can be outside the technical scope of DVC (a dirty fix would be to have empty files with stage names, but I don't think that's a good solution...). Therefore, I think a feature that could improve usability would be listing of stages. There could be a new option in the repro command such as dvc repro -l. This command would parse the dvc.yaml and list the stage names so that the user could type them by seeing the desired stage name on the same screen since it has just been printed out.

May 05 '20 17:05 mribeirodantas

Hi @mribeirodantas !

Sorry for the delay. We've changed the defaults in 1.0.0a1 to make dvc repro use dvc.yaml by default. Maybe that could help.

Regarding the shell completion, I think we can definitely implement that by doing something like dvc pipeline list inside the shell completion, but it is obviously not implemented yet, unfortunately :slightly_frowning_face:

There could be a new option in the repro command such as dvc repro -l. This command would parse the dvc.yaml and list the stage names so that the user could type them by seeing the desired stage name on the same screen since it has just been printed out.

I'm not sure if this is a good idea, seems like something that repro shouldn't bother with. If you think that shell completion is a more desirable feature, then we should go straight to implementing it, it shouldn't be too hard to do.

May 08 '20 19:05 efiop

Yeah, I talked to @shcheklein about shell completion and I gave a read in the files for shell completion (bash/zsh). I think that's something I would like to try to implement myself. Is it fine?

May 08 '20 20:05 mribeirodantas

@mribeirodantas that would be really cool and useful! :) btw, does dvc list show what we need in 1.0a? we can consider implementing specific options to make output machine parsable (json, or a pure list w/o headers) if needed.

May 08 '20 22:05 shcheklein

For the dvc list, I think it does.

May 09 '20 03:05 mribeirodantas

It looks like dvc pipeline list is not exactly what we want:

It also outputs DVC-files (inputs to the pipeline) - do we want to pass them to repro?
Output includes some delimiters, summary, etc - some option like --show-json is needed after all?

That's how it looks like for me right now:

dvc.yaml:prepare
dvc.yaml:featurize
dvc.yaml:train
dvc.yaml:evaluate
data/data.xml.dvc
================================================================================
1 pipelines total

May 10 '20 22:05 shcheklein

I think we do want to pass them to repro. The argument -p in dvc repro reproduces the stage that contains the specified dvc-tracked file, so it would be nice if dvc repro could also tab-complete the name of files contained in a stage.

About the format, we will have to parse it anyway, so whatever it's printed, we can parse that and make sure it's tab-completable. What do you think?

May 11 '20 00:05 mribeirodantas

... could also tab-complete the name of files contained in a stage.

it's a good feature and I've been thinking about this. e.g. run dvc repro model.pkl would actually find the stage that corresponds to that output and reproduce the stage. As far as I remember it's not implemented yet - we can create a feature request - it sounds very reasonable to me, and definitely easier than dvc repro dvc.yaml:train that we have in 1.0a (cc @dmpetrov )

The argument -p in dvc repro reproduces the stage that contains the specified dvc-tracked file

I think it actually expects the stage DVC-file, not one one of the outputs. Unless I'm missing something.

But even it were the case, I would have expected something like dvc repro data/data.xml, not dvc repro data/data.xml.dvc. Or at least both of those, like I mentioned above.

About the format, we will have to parse it anyway, so whatever it's printed, we can parse that and make sure it's tab-completable. What do you think?

The usual problem here is that it means the we make this output an API that we'll have to guarantee. Also, parsing will be pretty ad-hoc and weird. It is usually done with a special command. Here is a good guide on how to write a good output - https://devcenter.heroku.com/articles/cli-style-guide#human-readable-output-vs-machine-readable-output . I would say it makes sense to completely redo the default output for this command, as well as introduce:

... When needed, commands should offer a --json and/or a --terse flag when valuable to allow users to easily parse and script the CLI. ...

May 11 '20 04:05 shcheklein

We now have a desc and size keywords in the stages, which we can use it to our advantage, and provide --help like message for the stages.

$ dvc stages
build-us: Builds a US specific model  (prepare -> process -> build-us)
build-gb: Builds a UK specific model  (prepare -> process -> build-gb)

We could even provide a default message, if desc does not exist, like:

$ dvc stages
build-us: Produces `model-us.hdf5` (7M), depends on `us-markets.csv`

Or, maybe both of those to create a verbose output and maybe even with more fields.

$ dvc stages
build-us: Builds a US specific model
          Produces model-us.hdf5 (7M)
          Depends on: `us-markets.csv`, etc.
build-gb: Builds a UK specific model
          Produces model-gb.hdf5 (7M)
          Depends on: `gb-markets.csv`, etc.

I might have gone over the top here in the suggestion, but the core of it is to list stages and provide a snippet of a helpful message (preferably with beautiful colours). :)

Nov 26 '20 08:11 skshetry

Let's start with something simple like dvc stages <target> that lists all stages in the <target>. We could then use it in our autocompletion scripts(not necessarily part of the first step).

E.g.

$ dvc stages
data.dvc
dvc.yaml:stage1
dvc.yaml:stage2
path/to/dvc.yaml:stage3
path/to/other/dvc.yaml:stage4

or with target:

$ dvc stages dvc.yaml
dvc.yaml:stage1
dvc.yaml:stage2
$ dvc stages path
path/to/dvc.yaml:stage3
path/to/other/dvc.yaml:stage4

We can start with this being a default behavior for now, and we'll change it to something more verbose later as noted by @skshetry .

Nov 27 '20 07:11 efiop

maybe both of those to create a verbose output and maybe even with more fields.
$ dvc stages
build-us: Builds a US specific model
          Produces model-us.hdf5 (7M)
          Depends on ...

This looks a lot like opening dvc.lock (maybe just copy over desc to the lockfile if we don't do so yet). It's a bit more human-friendly for sure though.

Jan 09 '21 03:01 jorgeorpinel

It's a bit more human-friendly for sure though.

Jan 09 '21 03:01 skshetry

@skshetry Can we close this one as completed or as a duplicate of #5390?

Jul 08 '22 20:07 dberenbaum

dvc dvc copied to clipboard

Quick listing of stages

dvc
dvc copied to clipboard