fedora-coreos-pipeline
fedora-coreos-pipeline copied to clipboard
Lock base version and all plugins and add `bump-jenkins` job
Periodically the latest plugins and the version of jenkins offered by https://github.com/openshift/jenkins can get out of sync where jenkins won't come up. Let's run tests on some time schedule (weekly?) that verify the current configuration in this repo is able to at least bring jenkins up without errors.
This will mean that if we ever need to redeploy in haste we'll be ready.
We also need to start re-building the jenkins container image periodically to pick up new plugin updates so we don't freeze for 6 months and potentially carry security issues.
This could be a periodic job that triggers a build in openshift.
Let's repurpose this to go a step further and lock down all our plugins and the base Jenkins version itself and e.g. have a job that opens a PR to bump them with a link to documentation on how to test the PR in staging. (And maybe eventually it directly instruments the test in staging itself... or maybe better we request a separate namespace for this.)
This will require some investigations to figure out how we can run depsolving ourselves.
This needs to account to the different clusters we support being different versions (and thus having different Jenkins base versions). We could choose to track the older of the two, or if it's fully mechanical, maintaining two separate base versions + plugins list shouldn't be too bad either.
Let's repurpose this to go a step further and lock down all our plugins and the base Jenkins version itself and e.g. have a job that opens a PR to bump them with a link to documentation on how to test the PR in staging.
We've been discussing possible ways to implement this. One way is to have a job that removes the versions altogether, which would just pull all the latest versions, attempt a build like that and "see if it works" (loaded statement).
I tried to manually build the Jenkins instance without any versions and it failed to load because there seems to be an issue with the openshift/jenkins code. I opened https://github.com/openshift/jenkins/issues/1709 regarding it.
In the mean time, we might have to find an alternative way to automate this.
Was chatting with @marmijo about this today. One suggestion is to have the job on the prod Jenkins instrument the Jenkins on the stage cluster. Something like:
- Clone repo, edit lockfile, push to coreosbot-releng branch
- Start
jenkins-s2ibuild on stage cluster, pointing at the ref - Monitor build and deployment of new Jenkins
- Do a test FCOS build in the new Jenkins (see https://github.com/coreos/fedora-coreos-pipeline/blob/main/HACKING.md#triggering-builds-remotely)
- If it succeeds, open PR with branch
We'd need to figure out how to set up authentication across the clusters.
Another approach is to instrument the Jenkins from which the job is running itself, but if the plugins break so hard that the Jenkins never comes back up, we'll never hear back from that job.
And yet another approach is to bring up a second Jenkins in the same namespace. I think for this we'd need to make sure we get the resource naming right so there's no conflicts between the two, but it could get messy. I think it's more conventional to use a separate namespace/cluster for tests like this.
@aaradhak, @jlebon and I had a long discussion about this today.
- Start jenkins-s2i build on stage cluster, pointing at the ref
- Monitor build and deployment of new Jenkins
- Do a test FCOS build in the new Jenkins (see https://github.com/coreos/fedora-coreos-pipeline/blob/main/HACKING.md#triggering-builds-remotely)
It would probably be beneficial to split the workload of this job into two separate jobs. We should also consider renaming the job to bump-jenkins-plugins instead of bump-jenkins since it more accurately represents the purpose of the job:
bump-jenkins-plugins- Would update the plugins in
plugins.txtto their latest versions and open a PR
- Would update the plugins in
test-jenkins- Would take the contents from a PR (or even just a git ref) and test it on the FCOS staging cluster by building the jenkins instance using said contents. The job can then report the results back to GitHub.
- Would be manually run to prevent interruptions to the FCOS stage pipeline just in case someone is running something there.
- This job could be used for more than just this specific case. We would be able to use it to test any changes to the configuration of jenkins through just a PR to fedora-coreos-pipeline.
So the workflow would look something like this:
- [bump-jenkins-plugins]: clone repo, edit lockfile, push to coreosbot-releng fork of f-c-p
- [bump-jenkins-plugins]: open a PR against coreos:main
- [CoreOS team]: manually runs/enables
test-jenkinsfrom the PR - [test-jenkins]: update jenkins-s2i BuildConfig on FCOS stage cluster to point at coreos-bot releng fork (contents of PR)
- [test-jenkins]: build jenkins-s2i BuildConfig (will rebuild Jenkins)
- [test-jenkins]: build FCOS inside a cosa pod and run kola qemu tests
- [test-jenkins]: update commit status on PR
- [CoreOS team]: merge PR, OR triage any failures and rerun
Ideally, we'd like to use the dependency resolution that occurs upstream in openshift/jenkins, but it was removed from the upstream code earlier this year. Our current plan is to investigate if this resolveDependencies function will affect the plugins base jenkins plugins packages in openshift/jenkins, or if it will only install the dependencies required of the plugins in our plugins.txt. More specifically, if one of our add-on plugins requires a more recent version of a base plugin, will resolveDependencies update that base plugin version? Once we have more information, we'll try to open a discussion with the team that manages openshift/jenkins to explore re-enabling this function in that code.
After building a custom Jenkins image that was modified to run resolveDependencies, we found that it will install all dependencies (even optional ones) of the plugins we list in plugins.txt. However, even if the dependency is listed in base-plugins.txt, it will update it's version to the latest required.
We saw this today for example:
Examining optional dependency github-branch-source
Optional dependency github-branch-source already installed, need to determine if it is at a sufficient version
Upgrading previously downloaded plugin github-branch-source at 1725.vd391eef681a_e to 1732.v3f1889a_c475b_
Downloading plugin: github-branch-source from https://updates.jenkins.io/download/plugins/github-branch-source/1732.v3f1889a_c475b_/github-branch-source.hpi
The dependency plugins that fall into that category are all being updated to their latest versions in a pending PR: https://github.com/openshift/jenkins/pull/1697
How many dependencies in the base list were updated?
There were 7 dependencies already installed by base-plugins.txt
There were 7 dependencies already installed by
base-plugins.txt
I'm not sure if this answers my question or not. Let me ask a different way: in the resulting image, for the plugins in the base set, how many have a version that differs from what was in base-plugins.txt (at the git commit the S2I image is based on)?
7 plugins. They were already installed by base-plugins.txt and were updated as a result of being dependencies of ones we listed in plugins.txt.
Upgrading previously downloaded plugin credentials at 1254.vb_96f366e7b_a_d to 1271.v54b_1c2c6388a_
Upgrading previously downloaded plugin github-branch-source at 1725.vd391eef681a_e to 1732.v3f1889a_c475b_
Upgrading previously downloaded plugin kubernetes-client-api at 6.4.1-215.v2ed17097a_8e9 to 6.8.1-224.vd388fca_4db_3b_
Upgrading previously downloaded plugin okhttp-api at 4.11.0-145.vcb_8de402ef81 to 4.11.0-157.v6852a_a_fa_ec11
Upgrading previously downloaded plugin bouncycastle-api at 2.28 to 2.29
Upgrading previously downloaded plugin ssh-credentials at 305.v8f4381501156 to 308.ve4497b_ccd8f4
Upgrading previously downloaded plugin workflow-api at 1213.v646def1087f9 to 1215.v2b_ee3e1b_dd39
Arent we missing out on the basic-branch-build-strategies plugin dependencies here ?
basic-branch-build-strategies depends on branch-api:2.1092.vda_3c2a_a_f0c11,scm-api:672.v64378a_b_20c60,structs:324.va_f5d6774f3a_d
7 plugins. They were already installed by
base-plugins.txtand were updated as a result of being dependencies of ones we listed inplugins.txt.Upgrading previously downloaded plugin credentials at 1254.vb_96f366e7b_a_d to 1271.v54b_1c2c6388a_ Upgrading previously downloaded plugin github-branch-source at 1725.vd391eef681a_e to 1732.v3f1889a_c475b_ Upgrading previously downloaded plugin kubernetes-client-api at 6.4.1-215.v2ed17097a_8e9 to 6.8.1-224.vd388fca_4db_3b_ Upgrading previously downloaded plugin okhttp-api at 4.11.0-145.vcb_8de402ef81 to 4.11.0-157.v6852a_a_fa_ec11 Upgrading previously downloaded plugin bouncycastle-api at 2.28 to 2.29 Upgrading previously downloaded plugin ssh-credentials at 305.v8f4381501156 to 308.ve4497b_ccd8f4 Upgrading previously downloaded plugin workflow-api at 1213.v646def1087f9 to 1215.v2b_ee3e1b_dd39
Well, it was already listed in base-plugins.txt at a higher version, so it didnt actually upgrade it.
OK that's great. I think this could be a viable path forward then.
For the record, what I was worried about was how much drift we'd be introducing in the base plugin set; the more we drift from what was built and tested upstream, the more likely we are to hit issues. In this case, it seems like the changes are minimal (though of course in practice it'll depend on how stale the S2I image in the cluster we're running on is).
I think the next steps here are to discuss with openshift/jenkins team. One way to do this is to open a PR that allows conditionally turning back on dependency resolution.
Checked with the Openshift team on the reason why resolveDependency function was disabled.
Apparently they had several issues with the resolveDependency plugin resolution which gave non-compatible plugins that had to be resolved manually.
So now they run a server on openshift, have Jenkins upgrade all of the plugins, and then run https://github.com/openshift/jenkins/blob/master/scripts/jenkins-script-console.txt in the script console to get a list of all of the upgraded plugins and their versions. Looks like this approach works better for them.
Checked with the Openshift team on the reason why the the
resolveDependencyfunction was disabled.Apparently they had several issues with the
resolveDependencyplugin resolution which gave non-compatible plugins that had to be resolved manually.So now they run a server on openshift, have Jenkins upgrade all of the plugins, and then run openshift/jenkins@
master/scripts/jenkins-script-console.txt in the script console to get a list of all of the upgraded plugins and their versions. Looks like this approach works better for them.
Ack, thanks. This is good info.
So where does this leave us? It sounds like resolveDependency is probably not what we want either even if it were still available. I'd be OK following the same strategy too of using the built-in plugin manager, but that's probably harder to script (did you investigate that approach already?). I'm also OK trying out the dumb "bump to latest" approach we have so far but that'd require carrying the resolveDependendy buildconfig we have, which was meant as a temporary testing hack.
I haven't looked into the process of using the built-in plugin manager. If we are implementing the testing hack of using resolveDependency, we can add that hack to the test-jenkins job for testing the current bump-jenkins-plugins job.
Checked with the Openshift team on the reason why the the
resolveDependencyfunction was disabled. Apparently they had several issues with theresolveDependencyplugin resolution which gave non-compatible plugins that had to be resolved manually. So now they run a server on openshift, have Jenkins upgrade all of the plugins, and then run openshift/jenkins@master/scripts/jenkins-script-console.txt in the script console to get a list of all of the upgraded plugins and their versions. Looks like this approach works better for them.Ack, thanks. This is good info.
So where does this leave us? It sounds like
resolveDependencyis probably not what we want either even if it were still available. I'd be OK following the same strategy too of using the built-in plugin manager, but that's probably harder to script (did you investigate that approach already?). I'm also OK trying out the dumb "bump to latest" approach we have so far but that'd require carrying theresolveDependendybuildconfig we have, which was meant as a temporary testing hack.
OK, looks like Jenkins has an API for installing/updating plugins: $JENKINS_URL/pluginManager/installNecessaryPlugins (see the docs at $JENKINS_URL/pluginManager/api/). You can do a POST with some XML and it'll install the plugins. See e.g. some random GitHub code using it: https://github.com/wasanthag/ansible-jenkins/blob/82ec213b5423bfb50d804ef9a4feac381b2fe8aa/tasks/configure-jenkins.yml#L36 (you can find others by doing a GitHub search for the API name).
So we could use this to request the plugins we own to be updated to the latest and then gather the final list of plugins that Jenkins resolved (e.g. using the same script that openshift/jenkins uses) and open a PR with that. WDYT?
Ok, will need to look into using this API for updating the plugins. This means that it would require change in the existing approach on how the plugins are updated.
I think a lot of the code in #917 is still necessary. The main bit is how we construct the new plugins.txt would now come from Jenkins itself. Maybe we can have a higher bandwidth chat about it.
Yea sure, we can have a discussion. With the existing code, we can create a PR with the jenkins plugin updates - https://github.com/coreosbot-releng/fedora-coreos-pipeline/pull/2
https://jenkins-fedora-coreos-pipeline.apps.ocp.stg.fedoraproject.org/job/bump-jenkins-plugins/49/console
As you mentioned, the main change in the new approach would be in the part of updating the plugins.
Met with @aaradhak and @marmijo. We're reducing the scope of this to just opening a PR with the updated plugin versions. This little bit of automation is still helpful since then a reviewer can just test it in staging by pointing the buildconfig at the PR. A test-jenkins job would be cool but much more work. We'd like to instead focus on other pipeline priorities for now.
As a first pass, I'd be cool with just cleaning up jobs/bump-jenkins-plugins.Jenkinsfile in https://github.com/coreos/fedora-coreos-pipeline/pull/917 and starting with that (i.e. not try to check compatibility yet). It's useful to have a PR open that we can easily oc start-build from in the stg instance and tweak the versions that break.