lakeFS
lakeFS copied to clipboard
Staging behaviour is unclear when uploading the exact same objects
It is not allowed to commit to a branch with 0 changes to objects. However the term "empty staging area" is not clear when uploading the same objects again.
Example: lakeFS committed data with object foo.bar
. No entries in the staging area. The same object foo.bar
is uploaded under the same name. The list of uncommitted changes doesn't show the object and the UI commit button is disabled, which makes sense. However I was able to commit using lakectl
. The commit diff view shows no changes
.
This occurs in the DB (postgres) implementation of graveler, KV implementation isn't finished yet but will be able to resolve this.
Maybe the solution is during the upload object to check vs committed - and if they have the same Identity return an error?
I've tested the above and I think we need to update as it is more as a non trivial bug. We are able to update the last modified time and see no uncommitted data on a branch. To reproduce:
- Upload an object.
- Commit changes.
- Upload the same object again. The last update time on the UI/CLI was updated and changed is listed on the UI's "Uncommitted changes" or the CLI using 'diff'.
Discussed the issue with @ozkatz - it seems that the current implementation is the one that would introduce less issues, as we do not include the timestamp as part of the checksum. Having the verify/check the checksum when we set the data will slow the main operation flow with another read access. So for now keeping the staging stage and throwing the update after the commit is the accepted behaviour.
I completely agree with this existing behaviour, but...
Not sure why the GUI commit button should be disabled. I mean, I understand why it happens the way it's implemented... I just don't understand why the GUI blocks something that I could legitimately do from the CLI or the API.
@arielshaqed looking into it behaviour - think at this place we can have a better UX
@arielshaqed right as always - the UI does the eq. to lakectl diff lakefs://my-repo/main
.
This is the reason why the button is disabled.
The lakectl commit
calls to commit directly which does the commit and in the above case assume that the entry found in stage override the entry in the base commit. The code assumed that you added the entry because you already checked the identity (which is not true in this case).
This is why the behaviour in the UI doesn't match.