apm-server icon indicating copy to clipboard operation
apm-server copied to clipboard

smoke tests: v2

Open endorama opened this issue 9 months ago • 3 comments

This issue aims at consolidating multiple recent issues related to smoke tests and their Golang siblings, functional tests. Will act as meta issue to bridge the gaps between current smoke tests (v1) and v2 (v2 will reuse the current setup prepared for functionaltests. functionaltests will be migrated to smoke tests v2 and will be removed.)

We know that our current smoke tests framework has some limitations that slowed down some bug fixes and some feature implementation.

In parallel we started building some more complex tests after 8.15.0 to test ILM and DSL behavior in different APM Server versions. For these tests we decided to build a new Go test framework that we anticipate could replace the smoke tests one over time.

We are at a stage where most of the features in smoke tests are implemented in functional tests and we can think of moving to a v2 of smoke test, integrating the 2 series of tests in a single place. We can partition running tests using standard go test flags and patterns.

We can also take the chance to consolidate our current smoke testing strategy. Issues linked here still require to be uniformed/reworked to align with the direction, as of now there is some confusion around functionaltests and what they cover.

Approach:

  1. consolidate our current smoke tests, fixing what's broken and covering all use cases defined that do not require additional feature development
  2. migrate v1 smoke test cases to v2, one at a time https://github.com/elastic/apm-server/issues/16130
  3. migrate current relevant functional tests functionalities to testing/smoke/v2

Requirements to make the migration successful:

  1. we need to fetch latest versions https://github.com/elastic/apm-server/issues/16200
  2. we need to define all the scenarios we expect https://github.com/elastic/apm-server/issues/15844
  3. we need a way to create custom ILM policies to apply to APM data streams, this is still

Benefits from this work:

  1. Adding new tests scenarios https://github.com/elastic/apm-server/issues/16146
    1. we want to test the reroute processor use case https://github.com/elastic/apm-server/issues/14061
    2. we want to test ILM behaviour https://github.com/elastic/apm-server/issues/13898#issuecomment-2326277518
  2. manage ES deprecation info https://github.com/elastic/apm-server/issues/16199#issuecomment-2729238398

Where we are now:

  • the set of tests to run is discussed in https://github.com/elastic/apm-server/issues/15844 and mostly finalized
  • on smoke tests:
    • we need to address https://github.com/elastic/apm-server/issues/16179
    • we need to address https://github.com/elastic/apm-server/issues/16199
  • on functional tests:
    • we need to add tests for ILM upgrades/use use case https://github.com/elastic/apm-server/issues/13898
    • we need to automate fetching latest version https://github.com/elastic/apm-server/issues/16200
    • we need to add APM standalone tests from 7.x, WIP
    • https://github.com/elastic/apm-server/pull/15310 is probably superseded by https://github.com/elastic/apm-server/pull/15960 but we should recover the apm-data migration test case

endorama avatar Mar 17 '25 17:03 endorama

@endorama when originally discussing the functional tests, one of the goals was that we would be able to run certain tests across several versions quickly and in-expensively. E.g with switching to apm data in ES and switching to DSL we experienced different behavior when upgrading from v7.17 -> 8.15, v8.0-8.15, v8.7+ -> 8.15, v8.13+ -> 8.15. This was due to some changes in apm-server behavior during the 8.x lifetime. The classic smoke tests are usually running upgrades from the last major/minor versions to the current versions, we do not test upgrades from much older minors of the same major. When merging the functionaltests, that were specifically focusing at the data template, mapping and lifecycle setup, with general smoke tests, what is your proposal for achieving the test coverage we were aiming for with the functional tests while not making the overall smoke tests inefficient?

simitt avatar Mar 18 '25 07:03 simitt

@simitt may you clarify "inefficient"? What we want to optimize for? We have a trade off here: current smoke tests are more shallow and faster, current functional tests are deeper and a bit longer (compared to each other) but we are also introducing more tests.

If the concern is duration of a single test, current working smoke tests are taking ~10-12 minutes. Functional tests in the single upgrade scenario take ~16 minutes. I think we can consider them comparable. This would add a more complete ingestion + data lifecycle checks + log assertions to current smoke tests scenario. To me this is worth the trade off.

If our concern is overall time length, we have 2 strategies that I considered:

  1. parallelizing all tests: we can run each test in parallel, the functional tests are already using this strategy and have been developed to be able to run in parallel.
  2. filtering the tests: we can leverage command line flags (or build tags but I'm not keen on them) to filter which tests we want to run. For example running only SNAPSHOT tests or only BC ones. Or having dedicated version upgrades test that run only when specified.

So overall we can leverage a shared stable testing framework while running only relevant tests to keep the overall CI running time low.

Does this satisfy your concerns?

endorama avatar Mar 20 '25 10:03 endorama

The main line of work for this is completed and the new tests have feature parity with the previous tests.

With https://github.com/elastic/apm-server/pull/17085 the smoke test on ESS will be replaced by the new tests (which have been renamed to integration-server-test due to their scope).

That PR closes the work related to this issue.

Enhancement work for the new tests is tracked in https://github.com/elastic/apm-server/issues/17083.

endorama avatar Jun 04 '25 10:06 endorama