acceptance-test-harness icon indicating copy to clipboard operation
acceptance-test-harness copied to clipboard

Click 'check Now' when list of available plugins wasn't refreshed yet

Open Pldi23 opened this issue 2 years ago • 7 comments

Fix update center usage in ATH Cloudbees CI reported a lot of flaky tests caused by an issue with the Update Center, there are two types of failures with typical stactrace, both comes from @WithPlugins checkForUpdates_stacktrace.txt tickPluginToInstall_stacktrace.txt

Pldi23 avatar Sep 06 '22 15:09 Pldi23

can you explain a bit more what this is fixing?

timja avatar Sep 06 '22 16:09 timja

@jtnord @jmdesprez If I remember everything correctly this is the solution we discussed last week, please review and correct me if I'm wrong.

During investigating I found few weird things and It's not clear to me if is it ok or not:

  1. After our meeting I thought that we don't want to click 'Check Now' if MockUpdateCenter did not start, but with that change local server starts fine even if I ran test without internet connection (so we are still clicking 'Check Now' regardless of whether there is an internet connection or not). Is it a mistake in my expectations here?

  2. When i'm running ATH test locally, but wifi is switch off, I expect MockUpdateCenter to work and list of available plugins should not be empty, but it is Screenshot 2022-09-06 at 19 02 52
    Am I missed something here too?

  3. Analysing failure logs it seems that every ATH test annotated WithPlugin do not use MockUpdateCenter at all, every test log contain line Jul 20, 2022 9:28:10 AM org.jenkinsci.test.acceptance.update_center.MockUpdateCenter ensureRunning WARNING: found an unexpected number of update sites: [file:/root/ath/target/jenkins4850299043136569497home/war/WEB-INF/plugins/update-center.json, https://jenkins-updates.cloudbees.com/update-center/envelope-core-cm/update-center.json] comes from https://github.com/jenkinsci/acceptance-test-harness/pull/913/files#diff-cfb2fe3a56bdb573bba0e66a1af029d251f2068c3b2e95f380c3eff8f9c59ec6R97 . It means that ensureRunning returns and server does not start. Is it expected? (It also means that locally I'm running different ATH setup and can't fully reproduce the issue, could I somehow setup the same set of update sites locally (some property file or variable somewhere)?)

Pldi23 avatar Sep 06 '22 16:09 Pldi23

can you explain a bit more what this is fixing?

@timja I added description and a comment. For the context we have a meeting last week within FlakeBusters Team and James discussing that all ATH tests are flaky because everything using Update Center is flaky, and we share some thoughts how we could reduce flakiness when using @WithPlugins. This is a draft PR to discuss opened questions in details. Feel free to share your thoughts with us :)

Pldi23 avatar Sep 06 '22 16:09 Pldi23

  1. Analysing failure logs it seems that every ATH test annotated WithPlugin do not use MockUpdateCenter at all, every test log contain line

because you are using a CloudBees' jenkins build which has multiple update centers and does not use the mock update center but its own ones.

In order to test the code change you will need to run this against both an open source jenkins build (which only has the one update site and will not show that line) and CloudBees' site.

jtnord avatar Sep 07 '22 09:09 jtnord

can you explain a bit more what this is fixing?

Jenkins (OSS) at startup refreshes the update sites. This is asynchronous so may or may not have happened, but when testing OSS Jenkins the mockUpdateCetner will have been actived so refreshing the metadata is a good thing here.

In CloudBees products there is more than 1 update site so the mock update center is never used. Additionally bundles plugins that we support inside the war for offline users, so that no network interaction is required and has an UpdateSite to reflect this. This updateSite is refreshed as part of the product start up (in an Initializer so this updateSites data is always up to date). However there is another online updateSite to provide plugins that we do no support.

When clicking refreshnow - the tests become falky as this online updatecenter is well online, so network issues or server issues can make checkNow fail due to a network or server error. But as we do not support the plugins from this UpdateSite we do not care if the data it servers is out of date or not for the tests we run, as all the tests will use plugins that will come from the offline updateSite.

So by removing the forced check, we can make the tests much much less flaky.

The forced check should still be used for the MockUpdateCenter, but where and when this is used can be enhanced.

jtnord avatar Sep 07 '22 09:09 jtnord

  1. Analysing failure logs it seems that every ATH test annotated WithPlugin do not use MockUpdateCenter at all, every test log contain line

because you are using a CloudBees' jenkins build which has multiple update centers and does not use the mock update center but its own ones.

In order to test the code change you will need to run this against both an open source jenkins build (which only has the one update site and will not show that line) and CloudBees' site.

I ran this changes (cf72035) locally against jenkins build, seems that changes does not brake nothing, but when I run without internet connection the test is failing in tickPluginToInstall because list of available plugins is empty even after refresh. I also used test selector to run tests with internet connection against Cloudbees build ~~(by the way how could I ran it against cloudbees locally?)~~ and it also does not brake nothing.

Pldi23 avatar Sep 08 '22 08:09 Pldi23

@jtnord , @jmdesprez Ok, I completed testing for both jenkins war and cloudbees war. With this change:

  • on jenkins war ATH works as before, no regression found.

  • on cloudbees war we most likely will never click ‘Check now’, because during my testing on different cloudbees wars, with and without connection the list of all available plugins are full and we don’t even need to refresh plugins.

But when internet connection is off and WithPlugins need to install plugin which is not in the local offline update center the test will fail with tickPluginToInstall_stacktrace.txt. What would be our next steps? do we have any initial idea how to deal with that types of plugins? (do we want additional UC for them or include somehow into a offline UC, or something other?)

I also would suggest to merge this fix to reduce flakiness with ‘Check now’ while we thinking about tickPluginToInstall issue, what are your thoughts?

Pldi23 avatar Sep 13 '22 11:09 Pldi23

replaced by https://github.com/jenkinsci/acceptance-test-harness/pull/1185

jtnord avatar Jun 05 '23 14:06 jtnord