dvc icon indicating copy to clipboard operation
dvc copied to clipboard

more git hooks (dvc pull/push/status)

Open AlJohri opened this issue 6 years ago • 13 comments

can we add more git hooks for dvc pull/push/status when git pull/push/status are run? ideally, most git commands would have a hook. this will make it much less likely for someone to forget to dvc push after running a pipeline

AlJohri avatar Jan 30 '19 03:01 AlJohri

A new reason to prioritize this https://discordapp.com/channels/485586884165107732/485586884165107734/561380967311212554

shcheklein avatar Mar 30 '19 04:03 shcheklein

Another question:

is there anyway to have git hooks automatically install on pull or clone?
just thinking of first time developer experience, right now its quite a few steps

shcheklein avatar Mar 30 '19 04:03 shcheklein

Good point @shcheklein ! Added p1 priority, we'll try to take a look at it next week.

efiop avatar Mar 30 '19 10:03 efiop

After some quick research I have following conclusions:

  1. git pull - we can use post-merge hook to do dvc pull
  2. git push - I would like to discuss this one. It seems to me that we want to execute dvc push after successfull git push. However, all "post push hooks" (pre-receive, update, post-receive) are executed on push-side repository. We could use pre-push but with that we have to consider that dvc might take some time to push everything to cache, and then, in the end, git fails for some reason. It would be some inconvenience for the user. Though it does not seem as some repository breaking problem, as simple git reset and dvc pull should bring repository to original state.
  3. git status - it seems that we will be unable to implement this one, none of hooks is triggered upon status
  4. git clone - we can use post-checkout hook to trigger dvc checkout
  5. git checkout - i would consider creating issue for that one, because we can checkout whole branch or file, and in case of file we would need some logic to handle only checked out file, if git checks out stage file

pared avatar Apr 26 '19 23:04 pared

@pared

  1. git pull - we can use post-merge hook to do dvc pull

But not every pull involves merge. It might do rebase instead.

  1. git push - I would like to discuss this one. It seems to me that we want to execute dvc push after successfull git push. However, all "post push hooks" (pre-receive, update, post-receive) are executed on push-side repository. We could use pre-push but with that we have to consider that dvc might take some time to push everything to cache, and then, in the end, git fails for some reason. It would be some inconvenience for the user. Though it does not seem as some repository breaking problem, as simple git reset and dvc pull should bring repository to original state.

pre-push should be good enough :+1:

  1. git clone - we can use post-checkout hook to trigger dvc checkout
  2. git checkout - i would consider creating issue for that one, because we can checkout whole branch or file, and in case of file we would need some logic to handle only checked out file, if git checks out stage file

We already have post-checkot hook that simply runs dvc checkout(without arguments). Would it be possible to make it handle specific stage files as well?

efiop avatar Apr 27 '19 19:04 efiop

@efiop

But not every pull involves merge. It might do rebase instead.

Well we can use post-rewrite, according to the documentation it is triggered after amend or rebase, but git is supposed to pass argument indicating whether it is amend or rebase, so we could dvc pull upon rebase only.

Would it be possible to make it handle specific stage files as well?

According to documentation, git is supposed to pass flag whether it was branch or file, but it does not explicitly point which file. So i have been too fast with saying that it is possible. To use that information we would need mechanism to find checked out files. I think it is better to keep dvc checkout as it is.

pared avatar Apr 29 '19 18:04 pared

@pared

Well we can use post-rewrite, according to the documentation it is triggered after amend or rebase, but git is supposed to pass argument indicating whether it is amend or rebase, so we could dvc pull upon rebase only.

But that one is going to be called not only on git pull but also on regular amend and rebase. Not sure if it is something we actually want. What do you think?

According to documentation, git is supposed to pass flag whether it was branch or file, but it does not explicitly point which file. So i have been too fast with saying that it is possible. To use that information we would need mechanism to find checked out files. I think it is better to keep dvc checkout as it is.

Nice! Would it make sense to make our checkout hook not perform dvc checkout if it sees that it is a file that you are checking out? I imagine you git checkout-ing specific file and then our dvc checkout(without args) gets triggered and even though it has confirmation prompts, you might accidentally checkout some file that you didn't want.

So for now, only git push hook is the most straightforward that we could easily add, right?

Looks like even though some git hooks do suffice for some of our purposes, the support would still be far from perfect. Maybe having simple aliases like git push -> git push + dvc push would feel better? I don't quite like aliases on dvc's side(e.g. dvc push, or some other command, that would call git push) because they would bring a lot of hustle caused by passing through git options. On the other hand, things like aliases and/or git plugins would have to be handled and installed explicitly by a user, so it would add even more complication. I've been opposing this a lot since the early days, but considering everything (including requests for --autogitadd/commit feature), maybe dvc indeed should handle those. E.g. dvc config core.autoscm true -> would make dvc push automatically git push, dvc status automatically git status and so on? What do you guys think?

efiop avatar Apr 29 '19 21:04 efiop

But that one is going to be called not only on git pull but also on regular amend and rebase. Not sure if it is something we actually want. What do you think?

I must agree thats a bit of a bummer. We could solve amend vs rebase problem by reading flag passed by git, but problem "Was it pull or rebase" will remain unsolved. Both hooks (post-rewrite, post-merge) contain functionality that we need, but those details make them hard to use in our case.

Nice! Would it make sense to make our checkout hook not perform dvc checkout if it sees that it is a file that you are checking out?

I think we could do that, but then there is one problem left: what if checked out file was stage? In such case we should probably validate our stages against their outputs checksums.

So for now, only git push hook is the most straightforward that we could easily add, right?

I agree.

dvc config core.autoscm true -> would make dvc push automatically git push, dvc status automatically git status and so on?

I think that this idea is the best in terms of versatility. We tried the other way around (git hooks) and we already have problems with covering our commands with avialable hooks. I think that handling and debugginig this on dvc side with autoscm flag will be easier than handling issues about different OS-s that will come after introducing aliases.

pared avatar Apr 29 '19 22:04 pared

For the record: agreed privately that we would implement git push hook for now to provide at least some quick aid, but then we would think about autoscm and stuff like that.

efiop avatar Apr 30 '19 18:04 efiop

@AlJohri pre-push hook was released as a part of 0.40.1. Please feel free to give it a try 🙂

efiop avatar May 12 '19 11:05 efiop

I'm still interested in adding a git pull hook.

But that one is going to be called not only on git pull but also on regular amend and rebase. Not sure if it is something we actually want. What do you think?

I must agree thats a bit of a bummer. We could solve amend vs rebase problem by reading flag passed by git, but problem "Was it pull or rebase" will remain unsolved. Both hooks (post-rewrite, post-merge) contain functionality that we need, but those details make them hard to use in our case.

It seems like we can at least just implement it for the post-merge hook? For post-rewrite, I understand we can't easily identify between actual rebase vs pull --rebase but this only affects users who are running git pull --rebase, right?

I think the majority of users just use git pull without --rebase so can we can have it automatically dvc pull for at least these users?

AlJohri avatar Aug 25 '19 19:08 AlJohri

+1 for adding a git pull hook. (Just commenting to show there's still interest in this)

amin-nejad avatar Jan 08 '21 17:01 amin-nejad

Related: fds https://dagshub.com/blog/fds-fast-data-science-with-git-and-dvc/

casperdcl avatar Jun 22 '21 13:06 casperdcl