dvc icon indicating copy to clipboard operation
dvc copied to clipboard

commit: support granularity

Open pared opened this issue 5 years ago • 8 comments

Update: see https://github.com/iterative/dvc/issues/4297#issuecomment-739848041

~~version : 1.2.2~~

~~commiting output instead of stage, when there is no dvc.yaml results in:~~

~~ERROR: failed to commit data - 'dvc.yaml' does not exist.~~

~~in 0.94.0 it used to be: ERROR: failed to commit data - bad DVC-file name 'data'. DVC-files should be named 'Dvcfile' or have a '.dvc' suffix (e.g. 'data.dvc').~~

~~Reproduction script:~~

#!/bin/bash

rm -rf repo
mkdir repo

pushd repo
git init --quiet
dvc init --quiet

echo data >> data
dvc add data

git add -A
git commit -am "init"

dvc commit data

NOTE: When dvc.yaml is present, error is more readable: failed to commit data - "Stage 'data' not found inside 'dvc.yaml' file"

pared avatar Jul 28 '20 13:07 pared

The problem here is that we are using collect instead of collect_granular when resolving the target. Don't see us using it pre-1.0, so it is likely just a missing functionality. Likely just need to do the switch, but need to keep in mind that one might supply a file in a tracked directory, which needs to be handled appropriately (e.g. commiting the whole stage is probably not the desired behaviour). CC @skshetry or am I missing something and we've done that for a reason?

efiop avatar Nov 19 '20 00:11 efiop

@efiop, I don't think it was ever supported. Switching from collect to collect_granular would help, though it won't help with outputs being granular, it will commit all of the stage outputs.

skshetry avatar Nov 22 '20 06:11 skshetry

@skshetry Thanks! So looks like collect_granular + filter_info support for commit and we should be set :slightly_smiling_face:

efiop avatar Nov 22 '20 23:11 efiop

Discussed with @skshetry that we could start with just output-level granularity, e.g.

dvc add datafile
...
dvc commit datafile

dvc add datadir
...
dvc commit datadir

More granular commits like

dvc add datadir
....
dvc commit datadir/subdir/file

would require a bit more work on cache.save side, for it to accept filter_info and handle it properly, so we could do it as the next step after output-level granularity.

efiop avatar Dec 07 '20 11:12 efiop

@efiop I would like to start working on this.

mbiesek avatar Dec 12 '20 21:12 mbiesek

@mbiesek Great! Let us know if you'll have any questions :slightly_smiling_face:

efiop avatar Dec 13 '20 00:12 efiop

#6195 fixed the cache poisoning when trying to use granular commit. We do support granular commit, but the content that we write in dvc.yaml/.dvc is still of a complete set of files.

skshetry avatar Jul 14 '21 10:07 skshetry

@skshetry Can we close this one as completed or maybe open a new issue/edit the title to reflect the remaining work you would like to address? I think the original issue is solved, right?

dberenbaum avatar Jul 08 '22 21:07 dberenbaum