qri icon indicating copy to clipboard operation
qri copied to clipboard

bug(cmd save): qri save doesn't detect changes in body

Open Arqu opened this issue 4 years ago • 11 comments

Environment

OS info

ProductName:    Mac OS X
ProductVersion: 10.15.2
BuildVersion:   19C57

qri version

0.9.9-dev @ 47415190d37e5f25a8b3a3c60950467809ab2745

Expected Behavior

mkdir population_bank_world
cd population_bank_world
qri init
cp ../body.csv ./body.csv
qri save
echo "Zimbabwe,ZWE,\"Population, total\",SP.POP.TOTL,3776681,3905034,4039201,4178726,4322861,4471177,4623351,4779827,4941906,5111337,5289303,5476982,5673911,5877726,6085074,6293875,6502569,6712827,6929664,7160023,7408624,7675591,7958241,8254747,8562249,8877489,9200149,9527203,9849125,10153852,10432421,10680995,10900502,11092766,11261744,11410714,11541217,11653242,11747072,11822719,11881477,11923914,11954290,11982224,12019912,12076699,12155491,12255922,12379549,12526968,12697723,12894316,13115131,13350356,13586681,13814629,14030390,14236745,14439018,," >> ./body.csv
qri save

Expected to produce:

for linked dataset [arqu/population_bank_world]

dataset saved: arqu/population_bank_world@/ipfs/QmVx438tG5FZLM2xLaz9YD57RPSVhDR2fFv8QfaDuGRaok

Actual Behavior

for linked dataset [arqu/population_bank_world]

error saving: no changes

Other info

  • Dataset body taken from b5/world_bank_population

Arqu avatar Jun 10 '20 15:06 Arqu

Tried to reproduce but the save worked for me. The steps you gave, do they reliably work when run using a brand new user with an empty repo? (I used qriswitch to test those steps)

Other questions, after you've modified the body, right at the point when you run the second qri save, what do qri status and qri diff produce? What about qri log?

dustmop avatar Jun 10 '20 20:06 dustmop

$> qri log
1   Commit:  /ipfs/QmVx438tG5FZLM2xLaz9YD57RPSVhDR2fFv8QfaDuGRaok
    Date:    Wed Jun 10 11:20:46 CEST 2020
    Storage: local
    Size:    110 kB

    created dataset
$> qri status
for linked dataset [koka/population_bank_world]

  modified: body (source: body.csv)

run `qri save` to commit this dataset
$> qri diff
+330 elements. 0 inserts. 0 deletes.

body: [["Afghanistan","AFG","Population, total","SP.POP.TOTL",8996973,9169410,9351441,9543205,9744781,9956320,10174836,10399926,10637063,10893776,11173642,11475445,11791215,12108963,12412950,12689160,12943093,13171306,13341198,13411056,13356511,13171673,12882528,12537730,12204292,11938208,11736179,11604534,11618005,11868877,12412308,13299017,14485546,15816603,17075727,18110657,18853437,19357126,19737765,20170844,20779953,21606988,22600770,23680871,24726684,25654277,26433049,27100536,27722276,28394813,29185507,30117413,31161376,32269589,33370794,34413603,35383128,36296400,37172386,"",""],["Angola","AGO","Population, total","SP.POP.TOTL",5454933...
$> qri save
for linked dataset [koka/population_bank_world]

error saving: no changes

User is fresh, IPFS might be dirty. Will try to do another clean run. So far can reliably reproduce every time.

Edit: NVM, seems like IPFS is the issue. Was plenty dirty. Got the old repo backed up if you want to play but I'm happy to close this as I've fooled around plenty with it before this happened.

Arqu avatar Jun 10 '20 20:06 Arqu

Sure, send me along the backed up repo. Though it'll probably be lower priority if it's hard to reproduce. How were you using a fresh user but an existing ipfs repo?

I'm curious what causes this sort of thing to happen, and if there's a better way we can handle the error, instead of silently failing. Do you know what caused your ipfs repo to get into the "dirty" state that you believe was the root cause?

dustmop avatar Jun 10 '20 20:06 dustmop

I was manually re-doing my user but kept the .ipfs folder around. I'd say the probably biggest culprit is tons of the same user settings recreated while testing a local registry during development.

Agree it would be good to know how it can detect a diff but on save return there's no change. Other than that I'd just leave it on the back burner. I'll keep the backup around if this ever crops up and we can try to get to the bottom of it.

Arqu avatar Jun 10 '20 20:06 Arqu

So I'm re-opening this as I've managed to encounter this with an empty repo. Here's a branch that logs out the following (--log-all) https://github.com/qri-io/qri/commit/217b8dbd99b8a84c8c09c59124468d737f26bcba

$> qri save
for linked dataset [hepek/wbp]

⠋ DiffDescriptions 0
DiffDescriptions 1
buildComponentChanges HD:9, BD:3
buildComponentChanges type:replace
buildComponentChanges initCompChanges
buildComponentChanges type:+
buildComponentChanges initCompChanges
buildComponentChanges type:+
buildComponentChanges initCompChanges
buildComponentChanges assumeBody NOT Changed
buildBodyChanges parentPath: ''
buildBodyChanges delta: ' '
buildBodyChanges delta: ' '
buildBodyChanges delta: ' '
DiffDescriptions comp:meta
DiffDescriptions comp:structure
DiffDescriptions comp:readme
DiffDescriptions comp:viz
DiffDescriptions comp:transform
DiffDescriptions comp:body
DiffDescriptions 2
DiffDescriptions 3
error saving: no changes

$> qri save -m test produces the same result

Attached are the repo, config and dataset. bugdrop.zip

Notable differences in setup:

  • Using local registry

Arqu avatar Jun 11 '20 20:06 Arqu

Still unable to reproduce this. But looks at the logging output you added I'm noticing the following:

buildComponentChanges HD:9, BD:3
...
buildBodyChanges parentPath: ''
buildBodyChanges delta: ' '
buildBodyChanges delta: ' '
buildBodyChanges delta: ' '

This means that the bodyDeltas has 3 deltas in it, which is what gets returned from deepdiff. Iterating over those deltas, your code is printing the delta.Type, which is the ' '. This means, of course, DTContext https://github.com/qri-io/deepdiff/blob/master/delta.go#L13, which is represented by a space character and is used when nothing is different between the first and second values passed to deepdiff. So for whatever reason, the bodys that are handed off to deepdiff result in a diff with no changes.


One other thing, just to prevent following any incorrect leads, this part of the log:

buildComponentChanges assumeBody NOT Changed

should actually be stated as:

buildComponentChanges NOT assumeBodyChanged

in other words, we shouldn't assume the body has changed, because we should to inspect the result of deepdiff to determine that fact. The assumeBodyChanged is only true when the body is too large to diff, which is not the case here.

dustmop avatar Jun 12 '20 19:06 dustmop

So got another update on this. I can reliably reporduce this from a fresh repo. If the initial body is:

one,two,3
four,five,6

And I update it to:

one,two,3
four,five,6
four,five,6

It doesn't detect the changes, however if I instead update it to:

one,two,3
four,five,6
four,five,7

It works and saves correctly.

Now I'm not entirely sure to why it is happening (especially since we accurately detect the presence of a change on qri status), however my hunch would be that for status we only have a lightweight check like size/hash equality while the save path includes the differ to infer all the subesquent changes.

Now the differ might fail on the above if the rows are the same as it doesn't have enough context bellow/above and figures it is "the same row" and thus doesn't detect a "real" change in the contents. If you change the value, it obviously detects the row diff and then works as expected.

@dustmop @b5 any thoughts on this?

Arqu avatar Jul 02 '20 12:07 Arqu

Just to add newer info, seems like if I do qri save any number of times it fails. However if I do qri status and then qri save it picks up.

Arqu avatar Jul 29 '20 08:07 Arqu

Think this is another instance of this: https://github.com/qri-io/qri/issues/1505

Arqu avatar Sep 14 '20 16:09 Arqu

Would deeply enjoy fixing this bug, but likely it will require a rewrite, or at least a large overhaul of deepdiff. Part of the problem is that deepdiff contains non-determinism: https://github.com/qri-io/deepdiff/pull/7, but even aside from that, it returns incorrect results in certain known cases.

The save path uses deepdiff for 2 purposes: 1) determining whether something in the body has changed, and 2) deriving a textual message that describes what the change is. As long as deepdiff has these bugs, the save operation is going to behave incorrectly.

dustmop avatar Sep 14 '20 21:09 dustmop

hey just adding my own notation of seeing this bug, but running qri status before running qri save did not allow me to save

previous body:

one,two,3
four,five,6
seven,eight,9
ten,eleven,12

edited body:

one,two,3
four,five,6
seven,eight,9
ten,eleven,12
one,two,3
four,five,6
seven,eight,9
ten,eleven,12
one,two,3
four,five,6
seven,eight,9
ten,eleven,12
one,two,3
four,five,6
seven,eight,9
ten,eleven,12

output:

$ qri save --log-all
2020-10-06T20:06:45.408-0400	DEBUG	lib	lib/lib.go:382	--log-all set: turning on logging for all activity
2020-10-06T20:06:45.447-0400	DEBUG	cmd	cmd/qri.go:164	running cmd ["qri" "save" "--log-all"]
for linked dataset [ramfox_head/body]

2020-10-06T20:06:45.447-0400	DEBUG	base	base/dataset_prepare.go:38	PrepareSaveRef refStr="ramfox_head/body" bodyPathNameHint="" wantNewName=false
⠋ 2020-10-06T20:06:45.447-0400	DEBUG	base	base/dataset_prepare.go:101	PrepareSaveRef found previous initID="dkbbaysyxlxuxz3vq7ztc7r4o4h26gwqibrhiad5hq5svkc6ts4a"
2020-10-06T20:06:45.448-0400	DEBUG	base	base/save.go:25	SaveDataset initID="dkbbaysyxlxuxz3vq7ztc7r4o4h26gwqibrhiad5hq5svkc6ts4a" prevPath="/ipfs/QmZSWMCbjmFxxhWYC2pX1AGSiZunPS8g72xFF92yPBX4aU"
2020-10-06T20:06:45.448-0400	DEBUG	dsfs	dsfs/dataset.go:51	LoadDataset path="/ipfs/QmZSWMCbjmFxxhWYC2pX1AGSiZunPS8g72xFF92yPBX4aU"
2020-10-06T20:06:45.448-0400	DEBUG	dsfs	dsfs/dataset.go:77	LoadDatasetRefs path="/ipfs/QmZSWMCbjmFxxhWYC2pX1AGSiZunPS8g72xFF92yPBX4aU"
2020-10-06T20:06:45.448-0400	DEBUG	dsfs	dsfs/dataset.go:111	DerefDataset path="/ipfs/QmZSWMCbjmFxxhWYC2pX1AGSiZunPS8g72xFF92yPBX4aU"
2020-10-06T20:06:45.448-0400	DEBUG	dsfs	dsfs/dataset.go:51	LoadDataset path="/ipfs/QmZSWMCbjmFxxhWYC2pX1AGSiZunPS8g72xFF92yPBX4aU"
2020-10-06T20:06:45.448-0400	DEBUG	dsfs	dsfs/dataset.go:77	LoadDatasetRefs path="/ipfs/QmZSWMCbjmFxxhWYC2pX1AGSiZunPS8g72xFF92yPBX4aU"
2020-10-06T20:06:45.448-0400	DEBUG	dsfs	dsfs/dataset.go:111	DerefDataset path="/ipfs/QmZSWMCbjmFxxhWYC2pX1AGSiZunPS8g72xFF92yPBX4aU"
2020-10-06T20:06:45.449-0400	DEBUG	base	base/save.go:121	CreateDataset ds=&dataset.Dataset{bodyFile:(*localfs.LocalFile)(0xc00048dda0), Body:interface {}(nil), BodyBytes:[]uint8(nil), BodyPath:"/Users/ramfox/Work/datasets/body/body.csv", Commit:(*dataset.Commit)(0xc0005da230), Meta:(*dataset.Meta)(nil), Name:"body", Path:"/ipfs/QmZSWMCbjmFxxhWYC2pX1AGSiZunPS8g72xFF92yPBX4aU", Peername:"ramfox_head", PreviousPath:"/ipfs/QmZSWMCbjmFxxhWYC2pX1AGSiZunPS8g72xFF92yPBX4aU", ProfileID:"", Readme:(*dataset.Readme)(nil), NumVersions:0, Qri:"ds:0", Structure:(*dataset.Structure)(0xc000544320), Transform:(*dataset.Transform)(nil), Viz:(*dataset.Viz)(nil)} dsPrev=&dataset.Dataset{bodyFile:qipfs.ipfsFile{path:"/ipfs/Qmb8zEnXWaxNe83majrb9jmjUmAeat9do76AC1c1p8cb4i", r:(*unixfile.ufsFile)(0xc000284b50)}, Body:interface {}(nil), BodyBytes:[]uint8(nil), BodyPath:"/ipfs/Qmb8zEnXWaxNe83majrb9jmjUmAeat9do76AC1c1p8cb4i", Commit:(*dataset.Commit)(0xc0005dab60), Meta:(*dataset.Meta)(nil), Name:"", Path:"/ipfs/QmZSWMCbjmFxxhWYC2pX1AGSiZunPS8g72xFF92yPBX4aU", Peername:"", PreviousPath:"", ProfileID:"", Readme:(*dataset.Readme)(nil), NumVersions:0, Qri:"ds:0", Structure:(*dataset.Structure)(0xc000544140), Transform:(*dataset.Transform)(nil), Viz:(*dataset.Viz)(nil)}
2020-10-06T20:06:45.449-0400	DEBUG	dsfs	dsfs/dataset.go:254	CreateDataset
2020-10-06T20:06:45.449-0400	DEBUG	dsfs	dsfs/dataset.go:111	DerefDataset path="/ipfs/QmZSWMCbjmFxxhWYC2pX1AGSiZunPS8g72xFF92yPBX4aU"
2020-10-06T20:06:45.449-0400	DEBUG	dsfs	dsfs/dataset.go:111	DerefDataset path="/ipfs/QmZSWMCbjmFxxhWYC2pX1AGSiZunPS8g72xFF92yPBX4aU"
2020-10-06T20:06:45.449-0400	DEBUG	dsfs	dsfs/dataset.go:313	prepareDataset
2020-10-06T20:06:45.449-0400	DEBUG	dsfs	dsfs/dataset.go:520	set depth and entry count. depth=2 entries=16
2020-10-06T20:06:45.449-0400	DEBUG	dsfs	dsfs/dataset.go:567	set length and checksum. length=200 checksum="QmXieumd8M2v4zH6Egw5YmP9xUG9337SzXKbGtnhR5UTQx"
2020-10-06T20:06:45.450-0400	DEBUG	dsfs	dsfs/dataset.go:481	setErrorCount result ErrCount=0
2020-10-06T20:06:45.450-0400	DEBUG	dsfs	dsfs/dataset.go:578	generateCommitDescriptions
2020-10-06T20:06:45.450-0400	DEBUG	dsfs	dsfs/dataset.go:587	inlining body file to calulate a diff
2020-10-06T20:06:45.450-0400	DEBUG	dsfs	dsfs/dataset.go:773	calculating body statDiff type(prevBody)=[]interface {} type(nextBody)=[]interface {}
2020-10-06T20:06:45.451-0400	DEBUG	dsfs	dsfs/dataset.go:781	setting diff descriptions
2020-10-06T20:06:45.451-0400	DEBUG	friendly	friendly/friendly.go:26DiffDescriptions len(headDeltas)=9 len(bodyDeltas)=16 bodyStats=&{17 65 123 489 0 0 0} assumeBodyChanged=false
2020-10-06T20:06:45.451-0400	DEBUG	dsfs	dsfs/dataset.go:424	error saving: no changes
2020-10-06T20:06:45.451-0400	DEBUG	dsfs	dsfs/dataset.go:279	error saving: no changes
2020-10-06T20:06:45.451-0400	DEBUG	base	base/save.go:155	dsfs.CreateDataset: error saving: no changes
2020-10-06T20:06:45.451-0400	DEBUG	lib	lib/datasets.go:652	create ds error: error saving: no changes

ramfox avatar Oct 07 '20 00:10 ramfox