results icon indicating copy to clipboard operation
results copied to clipboard

[WIP] Logs API apporach to fix race condition due to pruning in results watcher

Open ramessesii2 opened this issue 1 year ago • 6 comments

Changes

Fixes #514 /kind bug

Submitter Checklist

These are the criteria that every PR should meet, please check them off as you review them:

  • [ ] Has Docs included if any changes are user facing
  • [ ] Has Tests included if any functionality added or changed
  • [x] Tested your changes locally (if this is a code change)
  • [x] Follows the commit message standard
  • [x] Meets the Tekton contributor standards (including functionality, content, code)
  • [x] Has a kind label. You can add a comment on this PR that contains /kind <type>. Valid types are bug, cleanup, design, documentation, feature, flake, misc, question, tep
  • [x] Release notes block below has been updated with any user-facing changes (API changes, bug fixes, changes requiring upgrade notices or deprecation warnings)
  • [x] Release notes contain the string "action required" if the change requires additional action from users switching to the new release

Release Notes

Free up resources (PipelineRun/TaskRun) potentially without any race conditions in pruning w.r.t to streaming logs even with no Grace Period. 

ramessesii2 avatar Feb 16 '24 01:02 ramessesii2

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: To complete the pull request process, please assign dibyom after the PR has been reviewed. You can assign the PR to them by writing /assign @dibyom in a comment when ready.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment Approvers can cancel approval by writing /approve cancel in a comment

tekton-robot avatar Feb 16 '24 01:02 tekton-robot

The following is the coverage report on the affected files. Say /test pull-tekton-results-go-coverage to re-run this coverage report

File Old Coverage New Coverage Delta
pkg/watcher/reconciler/dynamic/dynamic.go 69.3% 66.0% -3.3
pkg/watcher/results/logs.go 50.0% 40.9% -9.1

tekton-robot avatar Feb 16 '24 01:02 tekton-robot

/hold For #704

ramessesii2 avatar Feb 16 '24 01:02 ramessesii2

@ramessesii2: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
pull-tekton-results-build-tests 52942275a18350262c5db3a1af78c1dd33e0505c link true /test pull-tekton-results-build-tests

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

tekton-robot avatar Feb 16 '24 01:02 tekton-robot

@ramessesii2: PR needs rebase.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

tekton-robot avatar Apr 08 '24 14:04 tekton-robot

FYI @khrm now that #760 has merged and this WIP can be rebased and updated to use it, I'll also cross reference with the other thread we have been working, namely the rewrite of log storage, https://github.com/tektoncd/results/issues/763 , to access S3 directly from the watcher.

With that change, presumably the fix for the race condition becomes much simpler / moot, as the watcher will learn directly of error writing the logs to external storage.

I suppose it is possible that leveraging the #760 changes could still provide value as part of coordination, and of course there is the question of staging and when #763 is done (though I hope to get it prioritized on our team's end when we can next talk to Koustav.

@sayan-biswas @ramessesii2 @avinal @enarha @vdemeester FYI

gabemontero avatar Jul 05 '24 15:07 gabemontero

We can close this because we are changing the Logging approach.

khrm avatar Sep 04 '24 13:09 khrm

/close

khrm avatar Sep 04 '24 13:09 khrm

And to fix the racing condition we will have a finalizer now. https://github.com/tektoncd/results/pull/797

khrm avatar Sep 04 '24 13:09 khrm