dvc icon indicating copy to clipboard operation
dvc copied to clipboard

diff gets confused when the object is not in cache

Open skshetry opened this issue 3 years ago • 4 comments

#! /usr/bin/env bash

cd "$(mktemp -d)"
git init && dvc init
mkdir data
for i in $(seq 10); do
  echo $i >> data/$i.txt
done

dvc add data
git add -A && git commit -m "initialize"
rm -rf .dvc/cache  # clear cache
dvc diff

Expected

That it shows some files are not in cache correctly.

Actual

Added:
    data/1.txt
    data/10.txt
    data/2.txt
    data/3.txt
    data/4.txt
    data/5.txt
    data/6.txt
    data/7.txt
    data/8.txt
    data/9.txt

skshetry avatar Apr 29 '22 08:04 skshetry

Looks like a regression introduced in #6595. Before then I get:

WARNING: dir cache entry for 'data' is missing
Added:
    data/1.txt
    data/10.txt
    data/2.txt
    data/3.txt
    data/4.txt
    data/5.txt
    data/6.txt
    data/7.txt
    data/8.txt
    data/9.txt

cc @efiop

dberenbaum avatar May 04 '22 00:05 dberenbaum

@dberenbaum, I expect to not have any warnings at all, we have a Not in cache list for this, Added is misleading.

skshetry avatar May 05 '22 03:05 skshetry

True, you are right, they should all be shown under Not in cache.

dberenbaum avatar May 05 '22 14:05 dberenbaum

There is a test similar to this report where with cache folder deleted the new items show up in "added" part instead of "not in cache" part.

  1. Does the test needs to be changed?
  2. Since the cache might be deleted do we need to check for all the files in a_rev and workspace (set(old).union(set(new)) to see if they are present in cache?

https://github.com/iterative/dvc/blob/d92c6fed67828a992609c0f49548dba8b7fda15d/tests/func/test_diff.py#L58-L83

tirkarthi avatar May 11 '22 07:05 tirkarthi