dvc
dvc copied to clipboard
diff gets confused when the object is not in cache
#! /usr/bin/env bash
cd "$(mktemp -d)"
git init && dvc init
mkdir data
for i in $(seq 10); do
echo $i >> data/$i.txt
done
dvc add data
git add -A && git commit -m "initialize"
rm -rf .dvc/cache # clear cache
dvc diff
Expected
That it shows some files are not in cache correctly.
Actual
Added:
data/1.txt
data/10.txt
data/2.txt
data/3.txt
data/4.txt
data/5.txt
data/6.txt
data/7.txt
data/8.txt
data/9.txt
Looks like a regression introduced in #6595. Before then I get:
WARNING: dir cache entry for 'data' is missing
Added:
data/1.txt
data/10.txt
data/2.txt
data/3.txt
data/4.txt
data/5.txt
data/6.txt
data/7.txt
data/8.txt
data/9.txt
cc @efiop
@dberenbaum, I expect to not have any warnings at all, we have a Not in cache list for this, Added is misleading.
True, you are right, they should all be shown under Not in cache.
There is a test similar to this report where with cache folder deleted the new items show up in "added" part instead of "not in cache" part.
- Does the test needs to be changed?
- Since the cache might be deleted do we need to check for all the files in a_rev and workspace
(set(old).union(set(new))to see if they are present in cache?
https://github.com/iterative/dvc/blob/d92c6fed67828a992609c0f49548dba8b7fda15d/tests/func/test_diff.py#L58-L83