lakeFS icon indicating copy to clipboard operation
lakeFS copied to clipboard

Merge response contains zero values

Open itaiad200 opened this issue 1 year ago • 5 comments

We don't calculate the statistics of the merge operation and return a zero list regardless of the real result:

{'summary': {'added': 0, 'changed': 0, 'conflict': 0, 'removed': 0}}

If we want to continue supporting that response body we ought to return the real numbers.

itaiad200 avatar Aug 24 '22 06:08 itaiad200

AFAIR this has always been a bit suspect. (I think) I initially added it in some misguided attempt at feature parity with Git. Later it became too hard to support and we removed it; I think one issue is that whole-range operations become tough (does the size of a range appear on the metarange?), counting double deletes is surprisingly hard to do correctly, staging deletes are (or used to be) hard to count, etc.

I think it should be safe to get rid of it. It's been all-zeroes for a long while now, and I have seen ZERO complaints. Adding the dreaded "1.0" label as this is technically a breaking change :-)

arielshaqed avatar Aug 24 '22 07:08 arielshaqed

@arielshaqed I have seen > ZERO complaints :) I don't think there's a technical reason for this not to work (meta-ranges contain key counts for each range). Staging is less relevant since we're talking about a merge operation..

ozkatz avatar Aug 24 '22 09:08 ozkatz

@arielshaqed I have seen > ZERO complaints :)

I don't think there's a technical reason for this not to work (meta-ranges contain key counts for each range). Staging is less relevant since we're talking about a merge operation..

You're right -- I was worried that we were not filling them in, but actually they are! The only issue that remains is that sometimes (hopefully usually) we replace an entire range, and then we don't know how many items it changed. We could overestimate the counts, but this would be wildly inaccurate.

Perhaps make it an option in the API...

arielshaqed avatar Aug 24 '22 09:08 arielshaqed

I think we can suggest calling 'diff' after successful merge to get a merge results and removing the results from the merge API. This way we can keep the current merge optimizations.

nopcoder avatar Aug 27 '22 11:08 nopcoder

I think we can suggest calling 'diff' after successful merge to get a merge results and removing the results from the merge API. This way we can keep the current merge optimizations.

Thinking about it over the weekend, I think that this is exactly right. There is no significant advantage to returning an exact count. Meanwhile, for many (most?) programmatic application you would want to diff in order to do any of the following efficiently:

  • Determine whether any objects were added: just diff, return after first addition.
  • Determine -"- deleted: -"- deletion.
  • Count whether >1000 objects changed by merge.

(If need be, even add a separate counting-diff operation.)

arielshaqed avatar Aug 28 '22 05:08 arielshaqed

lakeFS v0.91.0 release will return summary, but the API was updated to have this field (and members) as optional. Separate PR will be merged later when we would like to drop the summary information from API.

nopcoder avatar Jan 29 '23 13:01 nopcoder

https://github.com/treeverse/airflow-provider-lakeFS/pull/46 Upgrade AirFlow integration

nopcoder avatar Jan 29 '23 14:01 nopcoder

@nopcoder please add a reminder for x weeks to delete it from the API.

itaiad200 avatar Jan 30 '23 09:01 itaiad200

@nopcoder @itaiad200 when do we plan on introducing this breaking change?

ozkatz avatar Feb 14 '23 14:02 ozkatz

Waiting for more clients to upgrade to a version where the Summary field is Optional. It will probably take months.

itaiad200 avatar Feb 16 '23 13:02 itaiad200