qri
qri copied to clipboard
bug(cmd save): qri save doesn't detect changes in body
Environment
OS info
ProductName: Mac OS X
ProductVersion: 10.15.2
BuildVersion: 19C57
qri version
0.9.9-dev @ 47415190d37e5f25a8b3a3c60950467809ab2745
Expected Behavior
mkdir population_bank_world
cd population_bank_world
qri init
cp ../body.csv ./body.csv
qri save
echo "Zimbabwe,ZWE,\"Population, total\",SP.POP.TOTL,3776681,3905034,4039201,4178726,4322861,4471177,4623351,4779827,4941906,5111337,5289303,5476982,5673911,5877726,6085074,6293875,6502569,6712827,6929664,7160023,7408624,7675591,7958241,8254747,8562249,8877489,9200149,9527203,9849125,10153852,10432421,10680995,10900502,11092766,11261744,11410714,11541217,11653242,11747072,11822719,11881477,11923914,11954290,11982224,12019912,12076699,12155491,12255922,12379549,12526968,12697723,12894316,13115131,13350356,13586681,13814629,14030390,14236745,14439018,," >> ./body.csv
qri save
Expected to produce:
for linked dataset [arqu/population_bank_world]
dataset saved: arqu/population_bank_world@/ipfs/QmVx438tG5FZLM2xLaz9YD57RPSVhDR2fFv8QfaDuGRaok
Actual Behavior
for linked dataset [arqu/population_bank_world]
error saving: no changes
Other info
- Dataset body taken from
b5/world_bank_population
Tried to reproduce but the save worked for me. The steps you gave, do they reliably work when run using a brand new user with an empty repo? (I used qriswitch to test those steps)
Other questions, after you've modified the body, right at the point when you run the second qri save
, what do qri status
and qri diff
produce? What about qri log
?
$> qri log
1 Commit: /ipfs/QmVx438tG5FZLM2xLaz9YD57RPSVhDR2fFv8QfaDuGRaok
Date: Wed Jun 10 11:20:46 CEST 2020
Storage: local
Size: 110 kB
created dataset
$> qri status
for linked dataset [koka/population_bank_world]
modified: body (source: body.csv)
run `qri save` to commit this dataset
$> qri diff
+330 elements. 0 inserts. 0 deletes.
body: [["Afghanistan","AFG","Population, total","SP.POP.TOTL",8996973,9169410,9351441,9543205,9744781,9956320,10174836,10399926,10637063,10893776,11173642,11475445,11791215,12108963,12412950,12689160,12943093,13171306,13341198,13411056,13356511,13171673,12882528,12537730,12204292,11938208,11736179,11604534,11618005,11868877,12412308,13299017,14485546,15816603,17075727,18110657,18853437,19357126,19737765,20170844,20779953,21606988,22600770,23680871,24726684,25654277,26433049,27100536,27722276,28394813,29185507,30117413,31161376,32269589,33370794,34413603,35383128,36296400,37172386,"",""],["Angola","AGO","Population, total","SP.POP.TOTL",5454933...
$> qri save
for linked dataset [koka/population_bank_world]
error saving: no changes
User is fresh, IPFS might be dirty. Will try to do another clean run. So far can reliably reproduce every time.
Edit: NVM, seems like IPFS is the issue. Was plenty dirty. Got the old repo backed up if you want to play but I'm happy to close this as I've fooled around plenty with it before this happened.
Sure, send me along the backed up repo. Though it'll probably be lower priority if it's hard to reproduce. How were you using a fresh user but an existing ipfs repo?
I'm curious what causes this sort of thing to happen, and if there's a better way we can handle the error, instead of silently failing. Do you know what caused your ipfs repo to get into the "dirty" state that you believe was the root cause?
I was manually re-doing my user but kept the .ipfs folder around. I'd say the probably biggest culprit is tons of the same user settings recreated while testing a local registry during development.
Agree it would be good to know how it can detect a diff but on save return there's no change. Other than that I'd just leave it on the back burner. I'll keep the backup around if this ever crops up and we can try to get to the bottom of it.
So I'm re-opening this as I've managed to encounter this with an empty repo.
Here's a branch that logs out the following (--log-all
) https://github.com/qri-io/qri/commit/217b8dbd99b8a84c8c09c59124468d737f26bcba
$> qri save
for linked dataset [hepek/wbp]
⠋ DiffDescriptions 0
DiffDescriptions 1
buildComponentChanges HD:9, BD:3
buildComponentChanges type:replace
buildComponentChanges initCompChanges
buildComponentChanges type:+
buildComponentChanges initCompChanges
buildComponentChanges type:+
buildComponentChanges initCompChanges
buildComponentChanges assumeBody NOT Changed
buildBodyChanges parentPath: ''
buildBodyChanges delta: ' '
buildBodyChanges delta: ' '
buildBodyChanges delta: ' '
DiffDescriptions comp:meta
DiffDescriptions comp:structure
DiffDescriptions comp:readme
DiffDescriptions comp:viz
DiffDescriptions comp:transform
DiffDescriptions comp:body
DiffDescriptions 2
DiffDescriptions 3
error saving: no changes
$> qri save -m test
produces the same result
Attached are the repo, config and dataset. bugdrop.zip
Notable differences in setup:
- Using local registry
Still unable to reproduce this. But looks at the logging output you added I'm noticing the following:
buildComponentChanges HD:9, BD:3
...
buildBodyChanges parentPath: ''
buildBodyChanges delta: ' '
buildBodyChanges delta: ' '
buildBodyChanges delta: ' '
This means that the bodyDeltas has 3 deltas in it, which is what gets returned from deepdiff
. Iterating over those deltas, your code is printing the delta.Type, which is the ' '
. This means, of course, DTContext
https://github.com/qri-io/deepdiff/blob/master/delta.go#L13, which is represented by a space character and is used when nothing is different between the first and second values passed to deepdiff. So for whatever reason, the bodys that are handed off to deepdiff result in a diff with no changes.
One other thing, just to prevent following any incorrect leads, this part of the log:
buildComponentChanges assumeBody NOT Changed
should actually be stated as:
buildComponentChanges NOT assumeBodyChanged
in other words, we shouldn't assume the body has changed, because we should to inspect the result of deepdiff to determine that fact. The assumeBodyChanged
is only true when the body is too large to diff, which is not the case here.
So got another update on this. I can reliably reporduce this from a fresh repo. If the initial body is:
one,two,3
four,five,6
And I update it to:
one,two,3
four,five,6
four,five,6
It doesn't detect the changes, however if I instead update it to:
one,two,3
four,five,6
four,five,7
It works and saves correctly.
Now I'm not entirely sure to why it is happening (especially since we accurately detect the presence of a change on qri status
), however my hunch would be that for status we only have a lightweight check like size/hash equality while the save path includes the differ to infer all the subesquent changes.
Now the differ might fail on the above if the rows are the same as it doesn't have enough context bellow/above and figures it is "the same row" and thus doesn't detect a "real" change in the contents. If you change the value, it obviously detects the row diff and then works as expected.
@dustmop @b5 any thoughts on this?
Just to add newer info, seems like if I do qri save
any number of times it fails. However if I do qri status
and then qri save
it picks up.
Think this is another instance of this: https://github.com/qri-io/qri/issues/1505
Would deeply enjoy fixing this bug, but likely it will require a rewrite, or at least a large overhaul of deepdiff. Part of the problem is that deepdiff contains non-determinism: https://github.com/qri-io/deepdiff/pull/7, but even aside from that, it returns incorrect results in certain known cases.
The save path uses deepdiff for 2 purposes: 1) determining whether something in the body has changed, and 2) deriving a textual message that describes what the change is. As long as deepdiff has these bugs, the save operation is going to behave incorrectly.
hey just adding my own notation of seeing this bug, but running qri status
before running qri save
did not allow me to save
previous body:
one,two,3
four,five,6
seven,eight,9
ten,eleven,12
edited body:
one,two,3
four,five,6
seven,eight,9
ten,eleven,12
one,two,3
four,five,6
seven,eight,9
ten,eleven,12
one,two,3
four,five,6
seven,eight,9
ten,eleven,12
one,two,3
four,five,6
seven,eight,9
ten,eleven,12
output:
$ qri save --log-all
2020-10-06T20:06:45.408-0400 DEBUG lib lib/lib.go:382 --log-all set: turning on logging for all activity
2020-10-06T20:06:45.447-0400 DEBUG cmd cmd/qri.go:164 running cmd ["qri" "save" "--log-all"]
for linked dataset [ramfox_head/body]
2020-10-06T20:06:45.447-0400 DEBUG base base/dataset_prepare.go:38 PrepareSaveRef refStr="ramfox_head/body" bodyPathNameHint="" wantNewName=false
⠋ 2020-10-06T20:06:45.447-0400 DEBUG base base/dataset_prepare.go:101 PrepareSaveRef found previous initID="dkbbaysyxlxuxz3vq7ztc7r4o4h26gwqibrhiad5hq5svkc6ts4a"
2020-10-06T20:06:45.448-0400 DEBUG base base/save.go:25 SaveDataset initID="dkbbaysyxlxuxz3vq7ztc7r4o4h26gwqibrhiad5hq5svkc6ts4a" prevPath="/ipfs/QmZSWMCbjmFxxhWYC2pX1AGSiZunPS8g72xFF92yPBX4aU"
2020-10-06T20:06:45.448-0400 DEBUG dsfs dsfs/dataset.go:51 LoadDataset path="/ipfs/QmZSWMCbjmFxxhWYC2pX1AGSiZunPS8g72xFF92yPBX4aU"
2020-10-06T20:06:45.448-0400 DEBUG dsfs dsfs/dataset.go:77 LoadDatasetRefs path="/ipfs/QmZSWMCbjmFxxhWYC2pX1AGSiZunPS8g72xFF92yPBX4aU"
2020-10-06T20:06:45.448-0400 DEBUG dsfs dsfs/dataset.go:111 DerefDataset path="/ipfs/QmZSWMCbjmFxxhWYC2pX1AGSiZunPS8g72xFF92yPBX4aU"
2020-10-06T20:06:45.448-0400 DEBUG dsfs dsfs/dataset.go:51 LoadDataset path="/ipfs/QmZSWMCbjmFxxhWYC2pX1AGSiZunPS8g72xFF92yPBX4aU"
2020-10-06T20:06:45.448-0400 DEBUG dsfs dsfs/dataset.go:77 LoadDatasetRefs path="/ipfs/QmZSWMCbjmFxxhWYC2pX1AGSiZunPS8g72xFF92yPBX4aU"
2020-10-06T20:06:45.448-0400 DEBUG dsfs dsfs/dataset.go:111 DerefDataset path="/ipfs/QmZSWMCbjmFxxhWYC2pX1AGSiZunPS8g72xFF92yPBX4aU"
2020-10-06T20:06:45.449-0400 DEBUG base base/save.go:121 CreateDataset ds=&dataset.Dataset{bodyFile:(*localfs.LocalFile)(0xc00048dda0), Body:interface {}(nil), BodyBytes:[]uint8(nil), BodyPath:"/Users/ramfox/Work/datasets/body/body.csv", Commit:(*dataset.Commit)(0xc0005da230), Meta:(*dataset.Meta)(nil), Name:"body", Path:"/ipfs/QmZSWMCbjmFxxhWYC2pX1AGSiZunPS8g72xFF92yPBX4aU", Peername:"ramfox_head", PreviousPath:"/ipfs/QmZSWMCbjmFxxhWYC2pX1AGSiZunPS8g72xFF92yPBX4aU", ProfileID:"", Readme:(*dataset.Readme)(nil), NumVersions:0, Qri:"ds:0", Structure:(*dataset.Structure)(0xc000544320), Transform:(*dataset.Transform)(nil), Viz:(*dataset.Viz)(nil)} dsPrev=&dataset.Dataset{bodyFile:qipfs.ipfsFile{path:"/ipfs/Qmb8zEnXWaxNe83majrb9jmjUmAeat9do76AC1c1p8cb4i", r:(*unixfile.ufsFile)(0xc000284b50)}, Body:interface {}(nil), BodyBytes:[]uint8(nil), BodyPath:"/ipfs/Qmb8zEnXWaxNe83majrb9jmjUmAeat9do76AC1c1p8cb4i", Commit:(*dataset.Commit)(0xc0005dab60), Meta:(*dataset.Meta)(nil), Name:"", Path:"/ipfs/QmZSWMCbjmFxxhWYC2pX1AGSiZunPS8g72xFF92yPBX4aU", Peername:"", PreviousPath:"", ProfileID:"", Readme:(*dataset.Readme)(nil), NumVersions:0, Qri:"ds:0", Structure:(*dataset.Structure)(0xc000544140), Transform:(*dataset.Transform)(nil), Viz:(*dataset.Viz)(nil)}
2020-10-06T20:06:45.449-0400 DEBUG dsfs dsfs/dataset.go:254 CreateDataset
2020-10-06T20:06:45.449-0400 DEBUG dsfs dsfs/dataset.go:111 DerefDataset path="/ipfs/QmZSWMCbjmFxxhWYC2pX1AGSiZunPS8g72xFF92yPBX4aU"
2020-10-06T20:06:45.449-0400 DEBUG dsfs dsfs/dataset.go:111 DerefDataset path="/ipfs/QmZSWMCbjmFxxhWYC2pX1AGSiZunPS8g72xFF92yPBX4aU"
2020-10-06T20:06:45.449-0400 DEBUG dsfs dsfs/dataset.go:313 prepareDataset
2020-10-06T20:06:45.449-0400 DEBUG dsfs dsfs/dataset.go:520 set depth and entry count. depth=2 entries=16
2020-10-06T20:06:45.449-0400 DEBUG dsfs dsfs/dataset.go:567 set length and checksum. length=200 checksum="QmXieumd8M2v4zH6Egw5YmP9xUG9337SzXKbGtnhR5UTQx"
2020-10-06T20:06:45.450-0400 DEBUG dsfs dsfs/dataset.go:481 setErrorCount result ErrCount=0
2020-10-06T20:06:45.450-0400 DEBUG dsfs dsfs/dataset.go:578 generateCommitDescriptions
2020-10-06T20:06:45.450-0400 DEBUG dsfs dsfs/dataset.go:587 inlining body file to calulate a diff
2020-10-06T20:06:45.450-0400 DEBUG dsfs dsfs/dataset.go:773 calculating body statDiff type(prevBody)=[]interface {} type(nextBody)=[]interface {}
2020-10-06T20:06:45.451-0400 DEBUG dsfs dsfs/dataset.go:781 setting diff descriptions
2020-10-06T20:06:45.451-0400 DEBUG friendly friendly/friendly.go:26DiffDescriptions len(headDeltas)=9 len(bodyDeltas)=16 bodyStats=&{17 65 123 489 0 0 0} assumeBodyChanged=false
2020-10-06T20:06:45.451-0400 DEBUG dsfs dsfs/dataset.go:424 error saving: no changes
2020-10-06T20:06:45.451-0400 DEBUG dsfs dsfs/dataset.go:279 error saving: no changes
2020-10-06T20:06:45.451-0400 DEBUG base base/save.go:155 dsfs.CreateDataset: error saving: no changes
2020-10-06T20:06:45.451-0400 DEBUG lib lib/datasets.go:652 create ds error: error saving: no changes