e2e-testing
e2e-testing copied to clipboard
Audit test pipelines
This is a master tracking issue issue for auditing which E2E test pipelines need to remain enabled.
Beats CI pipelines
Pipeline | Main Health | Triggers | Stakeholders | Issue(s) | Removal planned |
---|---|---|---|---|---|
Docker images | ⭕ Stale | None | Robots [@cachedout and @kuisathaverat ] | ❌ | |
Fleet E2E | 🔴 Broken | Daily build | Fleet [@joshdover] | https://github.com/elastic/elastic-agent/issues/1174 | |
Observability Helm Charts | 🟢 Healthy | Daily build | Robots [@cachedout and @kuisathaverat ] | Issue located in private repo | ✅ |
K8S Autodiscover | 🟡 Flakey | Daily build | Cloud Native Monitoring [@gizas] | ||
Observability MacOS | 🔴 Broken | Daily build | Elastic Agent [@cmacknz and @jlind23 ] | https://github.com/elastic/ci/issues/705 | |
Fleet Server | ⭕ Stale | None | Fleet [@joshdover ] | https://github.com/elastic/fleet-server/issues/1927 | |
Fleet UI | ⭕ Stale | None | Fleet and Integrations [@kpollich ] |
Fleet CI pipelines
Pipeline | Main Health | Triggers | Stakeholders | Issue |
---|---|---|---|---|
Pipeline helper | 🔴 Broken | Push to main; PR labeled | Elastic Agent[@cmacknz and @jlind23 ] | https://github.com/elastic/elastic-agent/issues/1174 |
⚠️ If you are listed as a stakeholder, we would like to know the following:
- Should the pipeline be removed from the CI or should it remain? 1.1 If the pipeline remains and is broken, what is the link to an issue tracking a fix? 1.2 If the pipeline should remain, how is it monitored by the team to ensure that build artifacts are not produced when the tests fail?
Next steps
Proposed pipeline criteria
I am proposing that we remove all pipelines which do not meet any of the following criteria:
- Necessary for the ongoing health of the E2E test suite itself
- Used by a product team as a quality gateway. Concretely, this means that a failing test blocks a PR from being merged or a build artifact from being produced.
- Exist to ensure the quality of a supported product.
Timeline
- All existing E2E pipelines have stakeholders assigned no later than: October 1, 2022
- All stakeholder agree upon proposed pipeline criteria no later than: October 20, 2022
- Non-confirming pipelines will be removed from Jenkins and code will be removed from the E2E test suite beginning on: Nov 1st, 2022
Related efforts
There is a separate effort to try and reduce the scope of E2E testing back to a point where stability can be maintained, but it is limited to tests for the Agent. That effort can be found here: https://github.com/elastic/elastic-agent/issues/1174
For the Macos Daily -> it was originally implemented in https://github.com/elastic/e2e-testing/pull/2626, and using the Orka ephemeral workers, and superseded https://github.com/elastic/e2e-testing/pull/2336
the error is something the @elastic/ci-systems might need to help with:
[2022-09-28T04:58:54.763Z] + .ci/scripts/deployment.sh create
[2022-09-28T04:58:54.887Z] Cloning into '.obs'...
[2022-09-28T04:58:55.095Z] Host key verification failed.
[2022-09-28T04:58:55.095Z] fatal: Could not read from remote repository.
[2022-09-28T04:58:55.095Z]
[2022-09-28T04:58:55.095Z] Please make sure you have the correct access rights
[2022-09-28T04:58:55.095Z] and the repository exists.
IIUC, the recent upgrade in the CI controllers added a host key verification by default, we reported this in the past and it was partially fixed since we dont' see the below error but a new one:

but the error now happens in a subsequent stage to clone a private repository -- see the above console log
It worked in the past

Docker images generated the Systemd Docker images used in the e2e tests, probably we are the stakeholders.
@v1v Thanks, that helps. I'm also trying to figure out what it actually does so that I can figure out how the stakeholders should be. I'm code-diving right now a bit to try and get a sense of that.
Observability Helm Charts can be removed
@kuisathaverat Thanks! Regarding the Docker images -- that pipeline hasn't been executed for over a year. Does it still need to exist?
Does it still need to exist?
It is the only way to generate those images, when they change should be executed. These images are for making a test on installation on a systems environment. The main changes that can have are bumping the systems version or the Linux version.
@cmacknz and @jlind23 Are you tracking any issues for the flakiness in the K8s Autodiscover pipeline?
@v1v Thanks, that helps. I'm also trying to figure out what it actually does so that I can figure out how the stakeholders should be. I'm code-diving right now a bit to try and get a sense of that.
There was an original request to test on MacOS, for such, it was initially attempted with the AWS MacOS, but it was declined for vary reasons:
- Cost, IIRC, machines will be created and pay for 24 hours minimal, see https://github.com/elastic/e2e-testing/pull/2336#issuecomment-1111883732
- Implementation, the Ansible ec2 integration didn't work well , see https://github.com/elastic/e2e-testing/pull/2336#issuecomment-1118315327
- Ephemeral Orkas were available. see https://github.com/elastic/e2e-testing/pull/2336#issuecomment-1147931612
I guess the stakeholder might be @jlind23 as he was the original requester for the MacOS in AWS
@cachedout this is the issue we will use for the first half of 8.6. @AndersonQ is already assigned to this and will closely work with you in order to get back to a better place.
@jlind23 That link seems wrong? :)
Sorry, this one - https://github.com/elastic/elastic-agent/issues/1174
@cmacknz and @jlind23 Are you tracking any issues for the flakiness in the K8s Autodiscover pipeline?
No, it may make sense to follow up with the Observability Cloudnative monitoring team to see if they have interest in fixing these tests faster than the agent team can get to them. They have done the majority of the recent work for autodiscovery features in agent.
No, it may make sense to follow up with the Observability Cloudnative monitoring team to see if they have interest in fixing these tests faster than the agent team can get to them.
Looping in @gizas . We are trying to stabilize the E2E test suite. Are you aware of the flakiness in the k8s autodiscover tests, and if so, is anybody on your time investigating them?
I have disabled most of the tests in the Fleet E2E suite while we eval what to do with the remaining: https://github.com/elastic/elastic-agent/issues/1174#issuecomment-1267023078
Sorry for delayed answer, @cachedout , @cmacknz just checking K8s Autodiscover pipeline. Can you point me to a fail instance to have a look?
Indeed in the past we had provided some fixes
Observability Helm Charts can be removed
Any reason this hasn't been done yet? Seeing it fail on a few PR runs recently and couldn't find the issue to track removing these.
Any reason this hasn't been done yet?
Hi @joshdover . The issue is this one: https://github.com/elastic/observability-robots/issues/1325
We were considering this blocked until they sorted out the future regarding charts, but TBH it's probably not a big deal if we just pull it out now if it's failing in PRs. LMK what you think.
Makes sense. I've only seen it fail once recently, but will flag it if it's more of a problem.
On Mon, Oct 17, 2022 at 2:01 PM Mike Place @.***> wrote:
Any reason this hasn't been done yet?
Hi @joshdover https://github.com/joshdover . The issue is this one: elastic/observability-robots#1325 https://github.com/elastic/observability-robots/issues/1325
We were considering this blocked until they sorted out the public communication regarding chart deprecation, but TBH it's probably not a big deal if we just pull it out now if it's failing in PRs. LMK what you think.
— Reply to this email directly, view it on GitHub https://github.com/elastic/e2e-testing/issues/3053#issuecomment-1280746306, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAN2UEF52N6EMXGDFWEMIPTWDU5YFANCNFSM6AAAAAAQXTAD34 . You are receiving this because you were mentioned.Message ID: @.***>