squad icon indicating copy to clipboard operation
squad copied to clipboard

Create testjob batch as an alternative method to determine build readiness

Open chaws opened this issue 4 years ago • 2 comments

Issue

There is no exact way of determining when a build is finished. Our current/best approach is implemented here and reports are sent based on this part and it looks like this:

Consider wait_before_notification=30min, notification_timeout=120min and that all 3 jobs were submitted at 1pm:

scenario 1 scenario 2 scenario 3
jobA ready in 10min jobA ready in 20min jobA ready in 20min
jobB ready in 15min jobB ready in 40min jobB ready in 130min
jobC ready in 20min jobC ready in 60min jobC ready in 200min
  • scenario 1
    • a report will be sent at 1:30pm with jobA, jobB and jobC results
  • scenario 2
    • because jobB and jobC weren't ready by 1:30pm, no report will be sent
    • at 3pm, notification_timeout will expire and a report will be sent with jobA, jobB and jobC results
  • scenario 3
    • because jobB and jobC weren't ready by 1:30pm, no report will be sent
    • at 3pm, notification_timeout will expire and a report will be sent with jobA results only, missing jobB and jobC results!

Problem

So far users managed to estimate wait_before_notification and notification_timeout time duration, but that's not optimal given that some other scenarios might cause jobs to take longer to complete, e.g. a busy lava server that keeps jobs on queue for long times.

We don't know how to determine when a build has all its jobs finished, mainly because we don't know how to determine how many jobs belong to a build (a newer job can be submitted at any other time). Also we don't know what is the "last" job of a build due to the same reason.

The idea of a testplan was discussed before, but it's a painful mechanism to solve this problem.

Idea

The idea (co-thought-by: @roxell) is to create a new endpoint in qareports so that job definitions for a given build are sent all at once, like a testjob set/batch. This could be some json file that looks like this:

jobset = {
    "build": "/group-slug/project-slug/build-version",
    "callback_url": "https://url-to-be-called-after-last-job-is-completed.com",
    "jobs": [
        {
            "environment": "environmentA",
            "backend": "the-backend-name",
            "definition": "the-yaml-job-definition",
        },
        {
            "environment": "environmentB",
            "backend": "the-backend-name",
            "definition": "the-yaml-job-definition",
        }
    ]
}

The greater benefit of this approach is to know for sure when a build is finished because now there's a way of knowing how many jobs a build has and to send reports without depending on timeouts.

Note the callback_url here. It might be used in other CI systems like gitlab-ci as a trigger to other stages.

Thoughts?

cc: @mwasilew @danrue @terceiro @roxell @jscook2345

chaws avatar Sep 14 '20 20:09 chaws

I think we need to get back an ID from SQUAD when we submit this jobset via lava-test-plans for instance... since one way to submit this, would be to submit one jobset per environment.

roxell avatar Sep 15 '20 13:09 roxell

@roxell one jobset per environment runs into the same issue as we have right now. All jobs for all envs would have to be submitted with one call.

mwasilew avatar Sep 15 '20 13:09 mwasilew