aqa-tests icon indicating copy to clipboard operation
aqa-tests copied to clipboard

Enable taking node temporarily offline due to specific machine issue in Adoptium

Open sophia-guo opened this issue 11 months ago • 10 comments

Adding the parameter SLACK_CHANNEL to the configuration of https://ci.adoptium.net/view/Test_grinder/job/Test_Job_Auto_Gen/ can take node offline due to specfiic machine issues.

This issue opened to monitor any issues with this enabled.

  • [x] Need permission to use new java.util.ArrayList https://ci.adoptium.net/job/Test_openjdk21_hs_sanity.openjdk_x86-64_linux_testList_1/19/console.
16:36:43  Test_openjdk21_hs_sanity.external_x86-64_linux #36 result is FAILURE. Checking console log for specific errors...
Scripts not permitted to use new java.util.ArrayList. Administrators can decide whether to approve or reject this signature.

  • [x] error not included Exception: hudson.AbortException: Failed to run ssh-agent: mkdtemp: private socket dir: No space left on device https://ci.adoptium.net/job/Test_openjdk21_hs_special.system_x86-64_linux/28/console
  • [ ] open infra issue correspondingly if works fine - The process of current jenkins' auto-offline machines that are low on space: If it happens it will be flagged in nagios and the infrastructure-bot channel which is regularly monitored by the team so the actions will take effect based on getting our attention that way. Individual. At the moment the process isn't strict - whoever in the infra team picks it up can decide whether to raise an issue on it. https://adoptium.slack.com/archives/C53GHCXL4/p1730735251541479?thread_ts=1730227347.646299&cid=C53GHCXL4

sophia-guo avatar Oct 31 '24 14:10 sophia-guo