ostree icon indicating copy to clipboard operation
ostree copied to clipboard

Fix handling of scratch deltas with existing commits

Open dbnicholson opened this issue 5 years ago • 16 comments

This fixes 2 bugs with handling of scratch deltas. First, if the existing commit is partial, use the scratch delta instead of the assumption that an object pull would be better. This fixes https://github.com/flatpak/flatpak/issues/3412 because it initially does a metadata only fetch to figure out some things about extra data flatpaks. Second, if the caller specifies require-static-deltas, honor it for a scratch delta.

The tests were a little tricky to put together because ostree internally will fall back to an object pull if the delta doesn't exist. I had to corrupt the superblock to make it error instead of falling back.

Closes: #2004

dbnicholson avatar Feb 12 '20 23:02 dbnicholson

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: dbnicholson To complete the pull request process, please assign cgwalters You can assign the PR to them by writing /assign @cgwalters in a comment when ready.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment Approvers can cancel approval by writing /approve cancel in a comment

openshift-ci-robot avatar Feb 12 '20 23:02 openshift-ci-robot

As I said here: https://github.com/ostreedev/ostree/issues/2004#issuecomment-585586281 I don't think this is quite right. The point of the original check was that if the ref is available at all locally its likely we have a previous version and thus a from-scratch-pull is likely inefficient.

With this change if you had a previous version of the ref and with flatpak downloading a COMMIT_ONLY of the latest version of the ref then the is-partial check will always be true, even if we have some other version of the ref so using the from-scratch is not a good idea.

alexlarsson avatar Feb 13 '20 11:02 alexlarsson

After sleeping on it, I agree. Even though we don't build scratch deltas for our OS, we'd never be able to use them with our updater since it also does a metadata only pull first.

A slightly more involved but still pretty simple check (as you're suggesting). If the current revision of partial but the parent exists and is not partial, then do an object pull as well. Actually, I could see cases where the parent is also partial because you didn't complete the previous pull and now the origin has published a new commit. It would be better to walk the ref until you find a normal commit.

So, then if there was a normal commit reachable from the ref, use objects. Otherwise, use the scratch delta. Does that make sense?

dbnicholson avatar Feb 13 '20 12:02 dbnicholson

That seems fine to me. Although I'm gonna do the workaround in flatpak, so its not a huge hurry for me for this change.

alexlarsson avatar Feb 13 '20 13:02 alexlarsson

Apologies if anyone was reviewing this. I restarted after the above discussion and force pushed over the previous series.

dbnicholson avatar Feb 13 '20 18:02 dbnicholson

Actually I don't know if this is right either. The walk over the commits parent will only succeed if the direct parent of the new commit object is available locally. If you jump multiple revisions when you do the update then it will not find the intermediate parents and thus not the grandparent.

alexlarsson avatar Feb 14 '20 10:02 alexlarsson

Ugh, you're right. I added a test where the parent is partial and the grandparent is normal, but it's entirely common that you don't have the parent at all.

I'm not sure I can think of a great way to keep this heuristic on the ostree side besides having metadata only pulls not update refs, but I'm sure that would break programs.

I think this can only be reliably handled is if the program does not update the ref with a metadata only pull. I.e., pull by revision instead of by ref. Then the heuristic could still be applied correctly - what's on the ref when you do the real pull will accurately represent the state you want.

dbnicholson avatar Feb 14 '20 13:02 dbnicholson

Another thing that might help is a flag or option that says not to update the ref. So you can fetch the commit to your repo but not update your local ref.

dbnicholson avatar Feb 14 '20 13:02 dbnicholson

@alexlarsson did you see my comment at https://github.com/flatpak/flatpak/pull/3413#discussion_r379012706?

dbnicholson avatar Feb 14 '20 14:02 dbnicholson

Ahh that's a tricky corner case. It's not at all obvious that doing a commit-only pull only would negatively affect static deltas. I guess a reflog would have been useful here. Hmm, couldn't we also use the ref-binding metadata too?

jlebon avatar Feb 14 '20 21:02 jlebon

We could probably extend the "partial" state to be:

  • not partial
  • partial metadata
  • partial content

The "use object fetch" path came about for things like fsck --delete I think - that's partial content. In a partial metadata case we'd prefer scratch pulls particularly if the client requested them?

cgwalters avatar Feb 14 '20 22:02 cgwalters

Ahh that's a tricky corner case. It's not at all obvious that doing a commit-only pull only would negatively affect static deltas. I guess a reflog would have been useful here. Hmm, couldn't we also use the ref-binding metadata too?

A git style reflog did come to mind, but I'd forgotten about the ref-binding metadata. That's a great idea.

dbnicholson avatar Feb 20 '20 16:02 dbnicholson

We could probably extend the "partial" state to be:

  • not partial
  • partial metadata
  • partial content

The "use object fetch" path came about for things like fsck --delete I think - that's partial content. In a partial metadata case we'd prefer scratch pulls particularly if the client requested them?

I think that would be useful, but I think you'd still be in the same situation for partial metadata if you had a previously not partial commit on the ref but now you can't find it because of a non-linear history. I think the only way to have the partial metadata state not throw off the heuristic is:

  1. Add an update-refs boolean pull option that defaults to TRUE which says to do the fetch but not update the refs. I could imagine several ways this could be useful (like pre-seeding a repo), but in this case it would be particularly useful since you could fetch the commit metadata and inspect it without affecting the state of the local refs. The downside is that it requires the caller to opt-in to get the desired scratch delta behavior and it's not at all obvious externally why this would make a difference. Oh, actually I don't think that would work because ostree_repo_pull_with_options doesn't provide the fetched revisions back to the caller. I guess you could stuff them into the progress variant.

  2. Dig through all the commits looking at the ref bindings as @jlebon says. I like that, although to be sure you're getting the ref corresponding to the remote you're pulling from right now you'd also have to look at the collection binding and the collection ID of the remote. Not terrible, but collection IDs aren't universal.

dbnicholson avatar Feb 20 '20 16:02 dbnicholson

Here's another rework that includes looking for matching commit bindings as a fallback if the history search is unsuccessful. The added tests indicate it's doing the right thing, but I'm not sure how comfortable I am potentially scanning for and loading all the commits in the repo just for the benefit of this heuristic.

dbnicholson avatar Feb 20 '20 21:02 dbnicholson

@dbnicholson: PR needs rebase.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

openshift-ci-robot avatar Apr 12 '21 21:04 openshift-ci-robot

@dbnicholson: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/sanity 04a55cb2c81693a2a978d14516c59953c5320476 link true /test sanity

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

openshift-ci[bot] avatar Apr 06 '22 20:04 openshift-ci[bot]