nextflow
nextflow copied to clipboard
Apply Azure Batch setting to all pools via config
New feature
Azure Batch has a number of settings to apply to a pool, such as autoscale, vmType, sku etc. But you must generate this configuration manually for all pools, requiring explicit configuration. For example, to make all pools use low priority VMs (taken from the Microsoft blog):
// Scale formula to use low-priority nodes only.
lowPriorityScaleFormula = '''
lifespan = time() - time("{{poolCreationTime}}");
interval = TimeInterval_Minute * {{scaleInterval}};
$samples = $PendingTasks.GetSamplePercent(interval);
$tasks = $samples < 70 ? max(0, $PendingTasks.GetSample(1)) : max($PendingTasks.GetSample(1), avg($PendingTasks.GetSample(interval)));
$targetVMs = $tasks > 0 ? $tasks : max(0, $TargetLowPriorityNodes/2);
targetPoolSize = max(0, min($targetVMs, {{maxVmCount}}));
$TargetLowPriorityNodes = lifespan < interval ? {{vmCount}} : targetPoolSize;
$TargetDedicatedNodes = 0;
$NodeDeallocationOption = taskcompletion;
'''
azure {
batch {
pools {
Standard_E2d_v4 {
autoScale = true
vmType = 'Standard_E2d_v4'
vmCount = 2
maxVmCount = 20
scaleFormula = lowPriorityScaleFormula
}
Standard_E8d_v4 {
autoScale = true
vmType = 'Standard_E8d_v4'
vmCount = 2
maxVmCount = 20
scaleFormula = lowPriorityScaleFormula
}
Standard_E16d_v4 {
autoScale = true
vmType = 'Standard_E16d_v4'
vmCount = 2
maxVmCount = 20
scaleFormula = lowPriorityScaleFormula
}
Standard_E32d_v4 {
autoScale = true
vmType = 'Standard_E32d_v4'
vmCount = 2
maxVmCount = 10
scaleFormula = lowPriorityScaleFormula
}
}
}
}
I'd like to simplify this to apply to all pools, somehow, so the config would look like this:
azure {
batch {
pools {
// magically apply the following to all pools
'*' {
autoScale = true
vmCount = 2
maxVmCount = 20
scaleFormula = lowPriorityScaleFormula
}
Standard_E2d_v4 {
vmType = 'Standard_E2d_v4'
}
Standard_E8d_v4 {
vmType = 'Standard_E8d_v4'
}
Standard_E16d_v4 {
vmType = 'Standard_E16d_v4'
}
Standard_E32d_v4 {
vmType = 'Standard_E32d_v4'
vmCount = 2
maxVmCount = 10
}
}
}
}
Usage scenario
Being able to apply generic configuration allows users to specify organisation wide set ups, or redeployable pipelines. For example, a Tower pipeline could be freely moved around Tower forge compute environments without having to re-write the config every time.
Suggest implementation
Two options I can think of:
- A special
default
pool, similar to the specialauto
pool, which applies to all pool parameters until overridden. In the above example we would use this:
default {
autoScale = true
vmCount = 2
maxVmCount = 20
scaleFormula = lowPriorityScaleFormula
}
- Use globbing similar to
withName
in a process selector. The above example would be:
'.*' {
autoScale = true
vmCount = 2
maxVmCount = 20
scaleFormula = lowPriorityScaleFormula
}
See also #4186 for publishDir
directive
Note when using the autoPools
feature of Nextflow you should be able to just assign this to auto
and achieve this. If you want to split it across multiple sized machines you may be able to do it like this comment:
https://github.com/nextflow-io/nextflow/issues/4304#issuecomment-2066958304