kubecf icon indicating copy to clipboard operation
kubecf copied to clipboard

kubecf-ci: pull-request-resource not picking up updates to PRs

Open bikramnehra opened this issue 5 years ago • 10 comments

Describe the bug We are seeing issues in pipeline where updates to PRs are not triggering new builds. This is mostly happening in cases where the PRs are old.

To Reproduce Make an update to existing PR and see if it gets picked up by: https://concourse.suse.dev/teams/main/pipelines/kubecf/resources/kubecf-pr

Expected behavior concourse should pick all updates to PRs and build them

Environment

  • CF Version: NA
  • Component/Plugin Name: NA
  • Others: NA

Additional context Also, see: https://github.com/telia-oss/github-pr-resource/issues/26

bikramnehra avatar May 20 '20 19:05 bikramnehra

I am proposing to open an issue at the aforementioned github repo with the following text.

Note, the only way I see to fix the issue locally is to fork the resource, change it, then make our own image for it and use that in the pipeline.

Proposed ticket

Hello. We believe to have found a serious issue in the implementation of this resource which causes it to semi-deterministically ignore eligible PRs. While this issue may look at first glance to be (# 26) it is not the same.

The problem looks to be in the interaction of the internal optimization to quickly skip already seen commits and filtering strategies like required_review_approvals, labels, etc.

The issue is that these filter strategies can make any arbitrarily old PR visible much later, namely when such a PR changes its state, i.e. getting the necessary approvals, getting one of the looked-for labels, etc.

There is a high chance that by the time the PR changes state and thus becomes eligible the already-seen threshold has moved beyond it and thus prevents the remainder of the Check code to see this PR. Despite it being eligible now.

While there are workarounds (See (x) below), these are very inconvenient, and then there is the necessity and problem of actually having to recognize when this issue has happened, and/or having to be continuously on the lookout for such.

IMHO the better solution is to completely remove the already-seen optimization, so that any old PR will be caught when it changes eligibility. This could be mixed with checking if there are filters like required_review_approvals in play at all and disabling the optimization only if that is true.

I can see that this may not be something you wish to do, as it may slow the check down if the set of possible PRs is ever-growing, or simply always quite large.

A suitable compromise may be the addition of (another) parameter which enables the user of the resource to control the already-seen optimization at their discretion. This could be a simple boolean, i.e. on/off, or something like a time interval. The latter would control how far back from the actual threshold to still check old PRs for state changes. Then, if it is, for example, expected that a PR will get an approval within 2 days, the parameter can be set accordingly, ensuring that all PRs within 2 days of the threshold are still checked. A special value for the interval could be used to completely disable the threshold, if the user has the desire to do so.


(x) Any git operation which updates the tip of the PR to a commit whose timestamp is before the threshold. I.e. pushing an empty change, or amend the head commit of the PR, etc. And even this is subject to race as it may require re-approval, re-labeling, etc. and while that is done by the user the threshold may have jumped over it again because the resource checked just while that was going on.

andreas-kupries avatar May 20 '20 19:05 andreas-kupries

For us locally, the relevant sections of the resource's code are https://github.com/telia-oss/github-pr-resource/blob/master/check.go#L36-L39 and https://github.com/telia-oss/github-pr-resource/blob/master/check.go#L65-L68

andreas-kupries avatar May 20 '20 19:05 andreas-kupries

as it may slow the check down if the set of possible PRs is ever-growing, or simply always quite large.

Given that we aim to do trunk-based development, I think it is sensible to assume that we'll keep open PRs to a sensible low number. If we end up with over 50 open PRs at a single point in time, we are obviously failing somehow.

The biggest threat I see from bot-created PRs, like buildpack and stemcell bumps. They just need to be either merged or closed in a timely manner.

For other PRs, I think it would be good if CI could skip Draft PRs; not sure if this is already supported or not.

jandubois avatar May 20 '20 20:05 jandubois

Note that the text is aimed at upstream, trying to anticipate an objection. For ourselves I see no problem to have this thing completely disabled based on the number of PRs we usually have open. Especially as this is also written in go, i.e. in the end it is machine code that runs, not some scripting lang VM.

andreas-kupries avatar May 20 '20 20:05 andreas-kupries

Created upstream ticket: https://github.com/telia-oss/github-pr-resource/issues/204

andreas-kupries avatar May 21 '20 17:05 andreas-kupries

Things left to do:

  • [x] Update the pipeline resource definition to use our fork
  • [x] Fly the pipeline and see if the issue was fixed

viovanov avatar Jun 18 '20 16:06 viovanov

This is the fix we need to apply: https://github.com/SUSE/github-pr-resource/commit/3ee0816d801a7038d6125796725aa3718c688b53

jimmykarily avatar Jun 22 '20 07:06 jimmykarily

Deployed the pipeline with a manually built image from our fork. Let's monitor the pipeline for a while to see if the issue if fixed.

jimmykarily avatar Jun 22 '20 12:06 jimmykarily

Not sure if should be part of the acceptance criteria for this issue, or if it requires a new one, but I expected that also external PRs would be triggered once they are approved. This still does not seem to be the case in #967

jandubois avatar Jun 24 '20 20:06 jandubois

Deployed the pipeline with a manually built image from our fork.

I added an additional commit to Andreas' PR: https://github.com/SUSE/github-pr-resource/commit/e92ede5

I've also deployed the pipeline with a locally built image: #1168

Note that the ak-kubecf-903 branch does not build because it fails some tests; here are Andreas' build instructions, that I also followed to get the working image:

I removed "github.com/telia-oss/github-pr-resource/fakes" and the entire func TestCheck locally

jandubois avatar Jul 28 '20 01:07 jandubois