TKG icon indicating copy to clipboard operation
TKG copied to clipboard

EPIC: Testing environment checks

Open llxia opened this issue 5 years ago • 11 comments

TKG should do general testing env checks before and/or after running the test. Below are something we could take a look to see if we should implement:

  • [x] diskspace (fail if we do not have enough diskspace)

  • [x] memory (fail if we do not have enough memory)

  • [ ] core (fail if there is core generated after a test run regardless of the test result)

  • [ ] list existing running processes before and after the test (fail if there is any dangling process?), this includes docker images leftover from aborted runs, those are not expected to be present and should get pruned

  • [ ] a flag to enable/disable this test environment health checks

llxia avatar Mar 11 '20 19:03 llxia

Note that dumps for an OutOfMemoryError doesn't necessarily indicate a problem. The test could have intentionally caused it and caught the exception. OpenJ9 tests that do this tend to run with dumps turned off to avoid the overhead.

pshipton avatar Mar 19 '20 21:03 pshipton

In the case of those tests, they should delete what they create as part of the test. What I believe we will do is check at the very end of all tests running if there are cores that remain as we zip up the test artifact, it will be considered a failure. We may find when we first enable this that some tests need to be updated to clean up after themselves.

smlambert avatar Mar 19 '20 22:03 smlambert

We can/should add one more type of check, whether the machine has the test prereqs installed, if not, fail and clearly list what is missing (or bad version).

smlambert avatar Mar 20 '20 02:03 smlambert

Related: https://github.com/AdoptOpenJDK/openjdk-infrastructure/issues/1410

It is becoming clear that this feature/issue needs to be addressed as soon as possible, as it will remove some of the random failures we are seeing in nightly pipelines, leaving more time for triagers to focus on real issues rather than tracking and chasing environment issues.

I recognize that if we were running in environments where we spin up on-demand machines each time some of this would not be needed, but given we will likely also always need to support running on static machines, we need to try to clean the slate or at least know the state of the slate each time.

Prioritizing the environment checks, I would put the check for running processes and the prune of docker images to be higher priority than other checks, as those will completely block the next set of tests from running successfully on a test machine.

smlambert avatar Jun 24 '20 19:06 smlambert

Related: https://github.com/AdoptOpenJDK/openjdk-tests/issues/1887

smlambert avatar Jul 14 '20 13:07 smlambert

We will temporarily land a change to maketest.sh, while awaiting a WIP fix for this issue. Once we have @nikolamilijevic1 PR merged into TKG, we can remove https://github.com/AdoptOpenJDK/openjdk-tests/pull/2059 once we are sure we address dangling processes in TKG.

smlambert avatar Nov 18 '20 17:11 smlambert

@llxia @renfeiw @sophia-guo - I have turned this into an EPIC that can be broken down into a set of smaller tasks (perhaps by the checklist in the description, or even more granular), what do you think?

smlambert avatar Jun 29 '21 21:06 smlambert

Related to the env check, the test framework should also check micro-architectures and execute/skip tests accordingly. Frome example, some VectorAPI tests only run on Z13/Z14 or newer, CRIU portability tests run on skylake, etc.

Related issue: runtimes/openj9-jit-z/issues/711 runtimes/infrastructure/issues/7037

llxia avatar Jun 27 '22 18:06 llxia

Added microarch check for skylake: https://github.com/adoptium/TKG/pull/346 and docker info check: https://github.com/adoptium/TKG/pull/351

renfeiw avatar Sep 07 '22 16:09 renfeiw

Related: https://github.com/adoptium/infrastructure/issues/2745 being able to see when the last time a machine was updated.

smlambert avatar Sep 15 '22 17:09 smlambert

Add check for bash --version #547

llxia avatar May 09 '24 16:05 llxia