dvc icon indicating copy to clipboard operation
dvc copied to clipboard

`dvc studio login`: setup auto pushing experiments

Open dberenbaum opened this issue 2 years ago • 9 comments

See ~#5029~ (edit: https://github.com/iterative/dvc.org/issues/5029) and the related issues linked there for background.

Rather than document the environment variables to auto push experiments, we could make this part of the studio login workflow since auto-pushing experiments is mostly useful when using studio rather than keeping experiments local. We would need to:

  1. Make config options like exp.auto_push and exp.git_remote
  2. During studio login, ask to set these options. The UI could look something like this:
$ dvc studio login
...
Authentication successful. The token will be available as risen-geum in Studio profile.
Do you want to push experiments automatically when they are completed [Y\n]?
Enter the Git remote to use [origin]:

dberenbaum avatar Dec 05 '23 02:12 dberenbaum

Since this is a studio login, I'd rather not have any prompts and enable everything by default. But we should notify users that they are enabled and provide hints to disable (and support arg to disable this behaviour).

skshetry avatar Dec 05 '23 05:12 skshetry

cc @iterative/vs-code since this should also impact the vs code flow

dberenbaum avatar Dec 05 '23 14:12 dberenbaum

@skshetry Do you mean to enable them during studio login or some other time? Auto-pushing is pretty connected to the Studio workflow since it's where the pushed experiments appear, and I don't think it's worthwhile to auto-push them without Studio.

Regardless, I think we can do the first step of adding config options. Having to set environment variables every time to auto push doesn't make much sense.

dberenbaum avatar Dec 05 '23 14:12 dberenbaum

Do you mean to enable them during studio login or some other time?

Enabling them automatically during studio login (unless it's not disabled already by other means).

skshetry avatar Dec 05 '23 14:12 skshetry

Not that strong an opinion, but gh auth login has prompts. While they can be clumsy, in this case there is already some interaction needed, so I didn't think prompts would be bad UX. What's your concern?

dberenbaum avatar Dec 05 '23 15:12 dberenbaum

From a new user perspective, it might be confusing and unclear what to choose. "Do you want to push experiments?" - maybe, maybe not, idk. What's experiments? etc.

It'll definitely lead to choice paralysis to me if I was using it for the first time. 😅

It's better to make a choice for them here. But the message should be clear that we are doing that. We want to have less interactions as possible, less decisions for user to make as possible.

skshetry avatar Dec 05 '23 15:12 skshetry

We also need a way to auto push on exp save for dvclive-only experiments. DVC_EXP_AUTO_PUSH does not do this now.

dberenbaum avatar Dec 06 '23 14:12 dberenbaum

Thoughts on this approach?

  • Once you login to studio, everything will be pushed automatically unless you set it to offline, and we can make clear during login how to toggle offline mode
  • We can show a notification before starting the push making clear that if you don't want to wait, it's safe to cancel and you can always upload later with exp push

dberenbaum avatar Dec 13 '23 23:12 dberenbaum

Not a requirement but nice to have would be to incorporate #8843 when doing this. If we can push the dvc-tracked data at the end of each stage, and include the run cache, it can help in scenarios like recovery from failed runners but also break up the pushes during the experiment run so the final push may not feel so painful.

dberenbaum avatar Jan 18 '24 16:01 dberenbaum

Tasks for this issue:

  • [x] Confirm DVC_EXP_AUTO_PUSH works as expected
  • [x] Make DVC_EXP_AUTO_PUSH default to use git remote origin (currently requires DVC_EXP_GIT_REMOTE)
  • [x] Make DVC_EXP_AUTO_PUSH work on dvc exp save
  • [x] Add config options for dvc config exp.auto_push and dvc config exp.git_remote
  • [x] Handle errors if no dvc or git remote
  • [x] Enable during dvc studio login with instructions or option to opt out
  • [x] During push, show useful messages in case it's slow (it's safe to cancel, how to upload later, how to disable push)
  • [x] Handle case where remote doesn't exist
  • [ ] Simplify ways to set git remote url
  • [x] Make auto push work with queue

Out of scope:

  • [ ] #8843

dberenbaum avatar Feb 20 '24 17:02 dberenbaum

@skshetry I updated the checklist above for what's left to do here.

dberenbaum avatar Mar 05 '24 13:03 dberenbaum

@dberenbaum, any thoughts on how to simplify?

skshetry avatar Mar 05 '24 14:03 skshetry

We could also make studio.repo_url an alias for exp.git_remote and deprecate it, so you can specify either a URL or a git remote name.

Originally posted by @skshetry in https://github.com/iterative/dvc.org/pull/5165#discussion_r1512677465

@skshetry This suggestion makes sense to me.

dberenbaum avatar Mar 05 '24 16:03 dberenbaum

Added Make auto push work with queue. Currently, queued experiments fail because origin is not set in the queued repo:

$ dvc exp run --run-all
Following logs for all queued experiments. Use Ctrl+C to stop following logs (experiment execution will continue).

Reproducing experiment 'sober-daze'
Running stage 'train':
> python src/stages/train.py --config=params.yaml
WARNING: Failed to validate remotes. Disabling auto push: 'origin' is not a valid Git remote or URL

Ran experiment(s):
To apply the results of an experiment to your workspace run:

        dvc exp apply <exp>

dberenbaum avatar Mar 05 '24 18:03 dberenbaum