storage icon indicating copy to clipboard operation
storage copied to clipboard

Improve error handling in archive creation

Open hanwen-flow opened this issue 8 months ago • 10 comments

hanwen-flow avatar Mar 19 '25 10:03 hanwen-flow

@mtrmac @giuseppe

FYI - I have found a different way to deal with tar file overhead in checkpoint/restore , so in the end I don't need tarWithOptionsTo as a public function. It might still be a good addition though. Most calls to TarWithOptions copy the the stream to a destination tar file.

hanwen-flow avatar Mar 19 '25 10:03 hanwen-flow

Oh and the lint task is failing, with some inscrutable failure… is that an infrastructure problem, or are we somehow throwing away the logs in CI?

mtrmac avatar Mar 19 '25 21:03 mtrmac

… but, also, I’m inclined to push for details on the “require write failures to be ignored” part. I’m not at all saying this is incorrect, but someone needs to understand that this is the case, and that we should preserve that behavior

these functions run on live containers, so another process may be mutating the file we're trying to copy out, so there can always be errors, eg. the path we're trying to archive was just removed by another process, so os.Lstat returns ENOENT.

This code could be much more resistant to races (eg. using openat(2) and friends to fix a file entry to export in just one place.), but it can't be fundamentally fixed: the container may truncate a file while we're writing it to the tar, causing a short read, and there is no way to recover from this.

With this change, we can be sure that we notice before handing back a corrupted file to the caller.

hanwen-flow avatar Mar 20 '25 08:03 hanwen-flow

these functions run on live containers, so another process may be mutating the file we're trying to copy out, so there can always be erro

I added this as a comment to TarWithOptions.

hanwen-flow avatar Mar 20 '25 08:03 hanwen-flow

Oh and the lint task is failing, with some inscrutable failure… is that an infrastructure problem, or are we somehow throwing away the logs in CI?

with some local tweaks:

$ make local-validate
make -C tests/tools GOLANGCI_LINT_VERSION=1.64.7
make[1]: Entering directory '/home/hanwen/vc/containers/storage/tests/tools'
make[1]: Nothing to be done for 'all'.
make[1]: Leaving directory '/home/hanwen/vc/containers/storage/tests/tools'
./hack/git-validation.sh
+ export GIT_VALIDATION=tests/tools/build/git-validation
+ GIT_VALIDATION=tests/tools/build/git-validation
+ '[' '!' -x tests/tools/build/git-validation ']'
+ EPOCH_TEST_COMMIT=
+ '[' -z '' ']'
++ git merge-base main HEAD
+ EPOCH_TEST_COMMIT=b87bac8fb133a3570c5b11542bf791ef550f308c
+ exec time tests/tools/build/git-validation -q -run DCO,short-subject -range b87bac8fb133a3570c5b11542bf791ef550f308c..HEAD
1.53user 4.91system 0:06.21elapsed 103%CPU (0avgtext+0avgdata 12544maxresident)k
0inputs+5616outputs (0major+643812minor)pagefaults 0swaps

Looks like it's taking 6 seconds on my 16-core machine to check if I have a signed-off line in my commits. :man_shrugging:

note that my 'main' branch is not upstream's main branch.

hanwen-flow avatar Mar 20 '25 08:03 hanwen-flow

I added this as a comment to TarWithOptions.

I like this, as well as the comments on the two functions around addFileData. That’s clear thinking and a clear rationale. Thanks!

mtrmac avatar Mar 20 '25 16:03 mtrmac

PTAL

hanwen-flow avatar Mar 24 '25 11:03 hanwen-flow

@mtrmac , @giuseppe - friendly ping.

hanwen-flow avatar Mar 26 '25 08:03 hanwen-flow

@mtrmac ping?

hanwen-flow avatar Mar 27 '25 12:03 hanwen-flow

I like this, as well as the comments on the two functions around addFileData. That’s clear thinking and a clear rationale. Thanks!

@mtrmac - would you be able to approve this? Or is there someone else who can give the 2nd LGTM?

hanwen-flow avatar Mar 31 '25 19:03 hanwen-flow

Rebased. Needs LGTM again. @giuseppe , @mtrmac

hanwen-flow avatar Apr 07 '25 08:04 hanwen-flow

@kolyshkin ping?

hanwen-flow avatar Apr 17 '25 07:04 hanwen-flow

@giuseppe - seems to need an LGTM (again?)

hanwen-flow avatar Apr 22 '25 08:04 hanwen-flow

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: giuseppe, hanwen-flow, kolyshkin

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:
  • ~~OWNERS~~ [giuseppe,kolyshkin]

Approvers can indicate their approval by writing /approve in a comment Approvers can cancel approval by writing /approve cancel in a comment

openshift-ci[bot] avatar Apr 23 '25 07:04 openshift-ci[bot]