dvc icon indicating copy to clipboard operation
dvc copied to clipboard

`dvc push` doesn't update cloud info with cloud versioned remotes

Open shcheklein opened this issue 2 years ago • 4 comments

Context: https://github.com/iterative/dvc/discussions/9907#discussioncomment-6990738

Let's say we have two remotes:

['remote "dev"']
    url = s3://yury-cloud-versioning-test/test-dev
    version_aware=true
['remote "prod"']
    url = s3://yury-cloud-versioning-test/test-prod
    version_aware=true

Let's say we need to migrate data from one to another.

I would expect commands like this:

dvc pull -r dev
dvc push -r prod

to work and update .dvc \ dvc.lock files with an appropriate info, in reality I'm getting:

(.venv) √ Projects/test-cloud-versioned % git diff
diff --git a/test.txt.dvc b/test.txt.dvc
index d5fbaa9..03f455c 100644
--- a/test.txt.dvc
+++ b/test.txt.dvc
@@ -4,6 +4,6 @@ outs:
   hash: md5
   path: test.txt
   cloud:
-    dev:
+    prod:
       etag: d8e8fca2dc0f896fd7cb4cb0031ba249
       version_id: UK3s0VcueuAIttMw7FROG8pRospYWNQI

(only remote name is updated, info stays the same, which is wrong for that remote).

In the original issue, even the object is not pushed to the new remote.

Also, in case of cloud versioning I think prod / dev don't make much sense in .dvc \ dvc.lock. Version_id is unique (I assume) and can't repeat in a different location. I guess we need to use some hash, or location itself in this case. How do we use these names at all? do we expect that specific remote name to exist in a config?

shcheklein avatar Sep 14 '23 21:09 shcheklein

Related: https://github.com/iterative/dvc/issues/8356 https://github.com/iterative/dvc/pull/8862

@skshetry Can you remember what the expected behavior is here? Should we be overwriting the remote info? Or disallowing this operation?

dberenbaum avatar Sep 19 '23 15:09 dberenbaum

How do we use these names at all? do we expect that specific remote name to exist in a config?

Yes, they are tied to the remote name defined in that git commit's .dvc/config.

Can you remember what the expected behavior is here

We should be pushing the file to the prod remote and updating the version ID and etag here.

I'm guessing this is an index bug caused by both remotes using the same s3 bucket. It looks like the user from the original discussion/context is using dvc==3.0.0 so this may just be a duplicate of https://github.com/iterative/dvc/issues/9904 (which is fixed in the latest release)

pmrowla avatar Sep 21 '23 17:09 pmrowla

@pmrowla I think I was able to reproduce it on the recent DVC version.

shcheklein avatar Sep 21 '23 18:09 shcheklein

I can also reproduce this with a single remote by changing the remote path. DVC will update the remote name but won't notice that it needs to push again.

It seems like DVC isn't checking whether the version IDs actually exist. I remember discussing this in https://github.com/iterative/dvc/pull/8766 but I think @pmrowla rightly saw it as dangerous and we continued to check which version were available.

dberenbaum avatar Oct 20 '23 14:10 dberenbaum