squad
squad copied to clipboard
Create testjob batch as an alternative method to determine build readiness
Issue
There is no exact way of determining when a build is finished. Our current/best approach is implemented here and reports are sent based on this part and it looks like this:
Consider wait_before_notification=30min
, notification_timeout=120min
and that all 3 jobs were submitted at 1pm:
scenario 1 | scenario 2 | scenario 3 |
---|---|---|
jobA ready in 10min | jobA ready in 20min | jobA ready in 20min |
jobB ready in 15min | jobB ready in 40min | jobB ready in 130min |
jobC ready in 20min | jobC ready in 60min | jobC ready in 200min |
- scenario 1
- a report will be sent at 1:30pm with jobA, jobB and jobC results
- scenario 2
- because jobB and jobC weren't ready by 1:30pm, no report will be sent
- at 3pm,
notification_timeout
will expire and a report will be sent with jobA, jobB and jobC results
- scenario 3
- because jobB and jobC weren't ready by 1:30pm, no report will be sent
- at 3pm,
notification_timeout
will expire and a report will be sent with jobA results only, missing jobB and jobC results!
Problem
So far users managed to estimate wait_before_notification
and notification_timeout
time duration, but that's not optimal given that some other scenarios might cause jobs to take longer to complete, e.g. a busy lava server that keeps jobs on queue for long times.
We don't know how to determine when a build has all its jobs finished, mainly because we don't know how to determine how many jobs belong to a build (a newer job can be submitted at any other time). Also we don't know what is the "last" job of a build due to the same reason.
The idea of a testplan was discussed before, but it's a painful mechanism to solve this problem.
Idea
The idea (co-thought-by: @roxell) is to create a new endpoint in qareports so that job definitions for a given build are sent all at once, like a testjob set/batch. This could be some json file that looks like this:
jobset = {
"build": "/group-slug/project-slug/build-version",
"callback_url": "https://url-to-be-called-after-last-job-is-completed.com",
"jobs": [
{
"environment": "environmentA",
"backend": "the-backend-name",
"definition": "the-yaml-job-definition",
},
{
"environment": "environmentB",
"backend": "the-backend-name",
"definition": "the-yaml-job-definition",
}
]
}
The greater benefit of this approach is to know for sure when a build is finished because now there's a way of knowing how many jobs a build has and to send reports without depending on timeouts.
Note the callback_url
here. It might be used in other CI systems like gitlab-ci as a trigger to other stages.
Thoughts?
cc: @mwasilew @danrue @terceiro @roxell @jscook2345
I think we need to get back an ID from SQUAD when we submit this jobset via lava-test-plans for instance... since one way to submit this, would be to submit one jobset per environment.
@roxell one jobset per environment runs into the same issue as we have right now. All jobs for all envs would have to be submitted with one call.