ostree
ostree copied to clipboard
Fix handling of scratch deltas with existing commits
This fixes 2 bugs with handling of scratch deltas. First, if the existing commit is partial, use the scratch delta instead of the assumption that an object pull would be better. This fixes https://github.com/flatpak/flatpak/issues/3412 because it initially does a metadata only fetch to figure out some things about extra data flatpaks. Second, if the caller specifies require-static-deltas, honor it for a scratch delta.
The tests were a little tricky to put together because ostree internally will fall back to an object pull if the delta doesn't exist. I had to corrupt the superblock to make it error instead of falling back.
Closes: #2004
[APPROVALNOTIFIER] This PR is NOT APPROVED
This pull-request has been approved by: dbnicholson
To complete the pull request process, please assign cgwalters
You can assign the PR to them by writing /assign @cgwalters in a comment when ready.
The full list of commands accepted by this bot can be found here.
Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment
As I said here: https://github.com/ostreedev/ostree/issues/2004#issuecomment-585586281 I don't think this is quite right. The point of the original check was that if the ref is available at all locally its likely we have a previous version and thus a from-scratch-pull is likely inefficient.
With this change if you had a previous version of the ref and with flatpak downloading a COMMIT_ONLY of the latest version of the ref then the is-partial check will always be true, even if we have some other version of the ref so using the from-scratch is not a good idea.
After sleeping on it, I agree. Even though we don't build scratch deltas for our OS, we'd never be able to use them with our updater since it also does a metadata only pull first.
A slightly more involved but still pretty simple check (as you're suggesting). If the current revision of partial but the parent exists and is not partial, then do an object pull as well. Actually, I could see cases where the parent is also partial because you didn't complete the previous pull and now the origin has published a new commit. It would be better to walk the ref until you find a normal commit.
So, then if there was a normal commit reachable from the ref, use objects. Otherwise, use the scratch delta. Does that make sense?
That seems fine to me. Although I'm gonna do the workaround in flatpak, so its not a huge hurry for me for this change.
Apologies if anyone was reviewing this. I restarted after the above discussion and force pushed over the previous series.
Actually I don't know if this is right either. The walk over the commits parent will only succeed if the direct parent of the new commit object is available locally. If you jump multiple revisions when you do the update then it will not find the intermediate parents and thus not the grandparent.
Ugh, you're right. I added a test where the parent is partial and the grandparent is normal, but it's entirely common that you don't have the parent at all.
I'm not sure I can think of a great way to keep this heuristic on the ostree side besides having metadata only pulls not update refs, but I'm sure that would break programs.
I think this can only be reliably handled is if the program does not update the ref with a metadata only pull. I.e., pull by revision instead of by ref. Then the heuristic could still be applied correctly - what's on the ref when you do the real pull will accurately represent the state you want.
Another thing that might help is a flag or option that says not to update the ref. So you can fetch the commit to your repo but not update your local ref.
@alexlarsson did you see my comment at https://github.com/flatpak/flatpak/pull/3413#discussion_r379012706?
Ahh that's a tricky corner case. It's not at all obvious that doing a commit-only pull only would negatively affect static deltas. I guess a reflog would have been useful here. Hmm, couldn't we also use the ref-binding metadata too?
We could probably extend the "partial" state to be:
- not partial
- partial metadata
- partial content
The "use object fetch" path came about for things like fsck --delete I think - that's partial content. In a partial metadata case we'd prefer scratch pulls particularly if the client requested them?
Ahh that's a tricky corner case. It's not at all obvious that doing a commit-only pull only would negatively affect static deltas. I guess a reflog would have been useful here. Hmm, couldn't we also use the
ref-bindingmetadata too?
A git style reflog did come to mind, but I'd forgotten about the ref-binding metadata. That's a great idea.
We could probably extend the "partial" state to be:
- not partial
- partial metadata
- partial content
The "use object fetch" path came about for things like
fsck --deleteI think - that's partial content. In a partial metadata case we'd prefer scratch pulls particularly if the client requested them?
I think that would be useful, but I think you'd still be in the same situation for partial metadata if you had a previously not partial commit on the ref but now you can't find it because of a non-linear history. I think the only way to have the partial metadata state not throw off the heuristic is:
-
Add an
update-refsboolean pull option that defaults toTRUEwhich says to do the fetch but not update the refs. I could imagine several ways this could be useful (like pre-seeding a repo), but in this case it would be particularly useful since you could fetch the commit metadata and inspect it without affecting the state of the local refs. The downside is that it requires the caller to opt-in to get the desired scratch delta behavior and it's not at all obvious externally why this would make a difference. Oh, actually I don't think that would work becauseostree_repo_pull_with_optionsdoesn't provide the fetched revisions back to the caller. I guess you could stuff them into the progress variant. -
Dig through all the commits looking at the ref bindings as @jlebon says. I like that, although to be sure you're getting the ref corresponding to the remote you're pulling from right now you'd also have to look at the collection binding and the collection ID of the remote. Not terrible, but collection IDs aren't universal.
Here's another rework that includes looking for matching commit bindings as a fallback if the history search is unsuccessful. The added tests indicate it's doing the right thing, but I'm not sure how comfortable I am potentially scanning for and loading all the commits in the repo just for the benefit of this heuristic.
@dbnicholson: PR needs rebase.
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.
@dbnicholson: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:
| Test name | Commit | Details | Required | Rerun command |
|---|---|---|---|---|
| ci/prow/sanity | 04a55cb2c81693a2a978d14516c59953c5320476 | link | true | /test sanity |
Full PR test history. Your PR dashboard.
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.