hail
hail copied to clipboard
[batch] Add Job Groups to Batch
This PR adds the job groups functionality as described in this RFC to the Batch backend and hailtop.batch_client
. This includes supporting nested job groups up to a maximum depth of 5. Note, that none of these changes are user-facing yet (hence no change log here).
The PRs that came before this one:
- #13475
- #13487
- #13810 (note that this database migration required a shutdown)
Subsequent PRs will need to implement the following:
- Querying job groups with the flexible query language (v2)
- Implementing job groups in the Scala Client for QoB
- Using job groups in QoB with
cancel_after_n_failures=1
for all new stages of worker jobs - UI functionality to page and sort through job groups
- A new
hailtop.batch
interface for users to define and work with Job Groups
A couple of nuances in the implementation came up that I also tried to articulate in the RFC:
- A root job group with ID = 0 does not belong to an update ("update_id" IS NULL). This means that any checks that look for "committed" job groups need to do
(batch_updates.committed OR job_groups.job_group_id = %s)
where "%s" is the ROOT_JOB_GROUP_ID. - When job groups are cancelled, only the specific job group that was cancelled is inserted into
job_groups_cancelled
. This table does NOT contain all transitive job groups that were also cancelled indirectly. The reason for this is we cannot guarantee that a user wouldn't have millions of job groups and we can't insert millions of records inside a single SQL stored procedure. Now, any query on the driver / front_end must look up the tree and see if any parent has been cancelled. This code looks similar to the code below [1]. - There used to be
DELETE FROM
statements incommit_batch_update
andcommit_batch
that cleaned up old records that were no longer used injob_group_inst_coll_cancellable_resources
andjob_groups_inst_coll_staging
. This cleanup now occurs in a periodic loop on the driver. - The
job_group_inst_coll_cancellable_resources
andjob_groups_inst_coll_staging
tables have values which represent the sum of all child job groups. For example, if a job group has 1 job and it's child job group has 2 jobs, then the staging table would have n_jobs = 3 for the parent job group and n_jobs = 2 for the child job group. Likewise, all of the billing triggers and MJC have to use thejob_group_self_and_ancestors
table to modify the job group the job belongs to as well its parent job groups.
[1] Code to check whether a job group has been cancelled.
SELECT job_groups.*,
cancelled_t.cancelled IS NOT NULL AS cancelled
FROM job_groups
LEFT JOIN LATERAL (
SELECT 1 AS cancelled
FROM job_group_self_and_ancestors
INNER JOIN job_groups_cancelled
ON job_group_self_and_ancestors.batch_id = job_groups_cancelled.id AND
job_group_self_and_ancestors.ancestor_id = job_groups_cancelled.job_group_id
WHERE job_groups.batch_id = job_group_self_and_ancestors.batch_id AND
job_groups.job_group_id = job_group_self_and_ancestors.job_group_id
) AS cancelled_t ON TRUE
WHERE ...