nextflow icon indicating copy to clipboard operation
nextflow copied to clipboard

Apply Azure Batch setting to all pools via config

Open adamrtalbot opened this issue 10 months ago • 2 comments

New feature

Azure Batch has a number of settings to apply to a pool, such as autoscale, vmType, sku etc. But you must generate this configuration manually for all pools, requiring explicit configuration. For example, to make all pools use low priority VMs (taken from the Microsoft blog):

// Scale formula to use low-priority nodes only.
lowPriorityScaleFormula = '''
    lifespan = time() - time("{{poolCreationTime}}");
    interval = TimeInterval_Minute * {{scaleInterval}};
    $samples = $PendingTasks.GetSamplePercent(interval);
    $tasks = $samples < 70 ? max(0, $PendingTasks.GetSample(1)) : max($PendingTasks.GetSample(1), avg($PendingTasks.GetSample(interval)));
    $targetVMs = $tasks > 0 ? $tasks : max(0, $TargetLowPriorityNodes/2);
    targetPoolSize = max(0, min($targetVMs, {{maxVmCount}}));
    $TargetLowPriorityNodes = lifespan < interval ? {{vmCount}} : targetPoolSize;
    $TargetDedicatedNodes = 0;
    $NodeDeallocationOption = taskcompletion;
'''

azure {
    batch {
        pools {
            Standard_E2d_v4 {
                autoScale = true
                vmType = 'Standard_E2d_v4'
                vmCount = 2
                maxVmCount = 20
                scaleFormula = lowPriorityScaleFormula
            }
            Standard_E8d_v4 {
                autoScale = true
                vmType = 'Standard_E8d_v4'
                vmCount = 2
                maxVmCount = 20
                scaleFormula = lowPriorityScaleFormula
            }
            Standard_E16d_v4 {
                autoScale = true
                vmType = 'Standard_E16d_v4'
                vmCount = 2
                maxVmCount = 20
                scaleFormula = lowPriorityScaleFormula
            }
            Standard_E32d_v4 {
                autoScale = true
                vmType = 'Standard_E32d_v4'
                vmCount = 2
                maxVmCount = 10
                scaleFormula = lowPriorityScaleFormula
            }
        }
    }
}

I'd like to simplify this to apply to all pools, somehow, so the config would look like this:

azure {
    batch {
        pools {
            // magically apply the following to all pools
            '*' {
                autoScale = true
                vmCount = 2
                maxVmCount = 20
                scaleFormula = lowPriorityScaleFormula
            }
            Standard_E2d_v4 {
                vmType = 'Standard_E2d_v4'
            }
            Standard_E8d_v4 {
                vmType = 'Standard_E8d_v4'
            }
            Standard_E16d_v4 {
                vmType = 'Standard_E16d_v4'
            }
            Standard_E32d_v4 {
                vmType = 'Standard_E32d_v4'
                vmCount = 2
                maxVmCount = 10
            }
        }
    }
}

Usage scenario

Being able to apply generic configuration allows users to specify organisation wide set ups, or redeployable pipelines. For example, a Tower pipeline could be freely moved around Tower forge compute environments without having to re-write the config every time.

Suggest implementation

Two options I can think of:

  • A special default pool, similar to the special auto pool, which applies to all pool parameters until overridden. In the above example we would use this:
            default {
                autoScale = true
                vmCount = 2
                maxVmCount = 20
                scaleFormula = lowPriorityScaleFormula
            }
  • Use globbing similar to withName in a process selector. The above example would be:
            '.*' {
                autoScale = true
                vmCount = 2
                maxVmCount = 20
                scaleFormula = lowPriorityScaleFormula
            }

adamrtalbot avatar Aug 29 '23 10:08 adamrtalbot

See also #4186 for publishDir directive

bentsherman avatar Aug 29 '23 14:08 bentsherman

Note when using the autoPools feature of Nextflow you should be able to just assign this to auto and achieve this. If you want to split it across multiple sized machines you may be able to do it like this comment:

https://github.com/nextflow-io/nextflow/issues/4304#issuecomment-2066958304

adamrtalbot avatar Apr 19 '24 17:04 adamrtalbot