configs icon indicating copy to clipboard operation
configs copied to clipboard

Azure: Update to Azure config [WIP]

Open adamrtalbot opened this issue 1 year ago • 2 comments

Additions:

  • More Azure specific features included in NF-Core config

Relates to #401


name: Azure about: Add more features to Azure config

Please follow these steps before submitting your PR:

  • [x] If your PR is a work in progress, include [WIP] in its title
  • [x] Your PR targets the master branch
  • [ ] You've included links to relevant issues, if any

Steps for adding a new config profile:

  • [x] Add your custom config file to the conf/ directory
  • [x] Add your documentation file to the docs/ directory
  • [x] Add your custom profile to the nfcore_custom.config file in the top-level directory
  • [x] Add your custom profile to the README.md file in the top-level directory
  • [x] Add your profile name to the profile: scope in .github/workflows/main.yml

adamrtalbot avatar Aug 24 '22 14:08 adamrtalbot

Check if nextflow config runs in repository root tests can be ignored, it's a bug in NXF

jfy133 avatar Aug 30 '22 09:08 jfy133

@vsmalladi I haven't had a chance to check this. I'm on leave next week but when I get a chance I will give it a go. If it has everything we need and doesn't break anything we should smash that merge button.

adamrtalbot avatar Oct 15 '22 11:10 adamrtalbot

@vsmalladi I'm finding this raises an error with the autoscaling formula. Works fine if autoscalling is turned off. Have you found this?

Caused by:
  Status code 400, {
  "odata.metadata":"REDACTED/$metadata#Microsoft.Azure.Batch.Protocol.Entities.Container.errors/@Element","code":"AutoScalingFormulaEvaluationError","message":{
    "lang":"en-US","value":"The specified auto-scaling formula has evaluation error\nRequestId:e89088e4-6879-4f4c-9d49-f413864b358b\nTime:2022-10-25T15:44:23.3590143Z"
  },"values":[
    {
      "key":"Message","value":"Line 3, Col 25: Argument 1 is invalid"
    },{
      "key":"Result","value":"$TargetDedicatedNodes=0;$TargetLowPriorityNodes=0;$NodeDeallocationOption=requeue"
    }
  ]
}

adamrtalbot avatar Oct 25 '22 15:10 adamrtalbot

@adamrtalbot That is odd. I have not found that issue. Can you see your quota for Dedicated nodes in Dv3 family?

vsmalladi avatar Oct 25 '22 16:10 vsmalladi

Looks like plenty of nodes. If I turn off autoscaling it's fine. If I add a custom autoscaling formula to the pool it's fine:

pools {
          auto {
              vmType       = params.vm_type
              autoScale    = true
              vmCount      = 1
              maxVmCount   = 12
              scaleFormula = '''
                  $TargetLowPriorityNodes = 1;
                  $TargetDedicatedNodes   = 0;
                  $NodeDeallocationOption = taskcompletion;
              '''
          }
      }

Is the default autoscaling formula in Nextflow broken?

adamrtalbot avatar Oct 26 '22 09:10 adamrtalbot

@adamrtalbot What version of nextflow, and pipeline are you testing.

I can give it a try on my end.

vsmalladi avatar Oct 26 '22 12:10 vsmalladi

nextflow -version

      N E X T F L O W
      version 22.04.5 build 5708
      created 15-07-2022 16:09 UTC 
      cite doi:10.1038/nbt.3820
      http://nextflow.io
java -version

openjdk version "17.0.4.1" 2022-08-12
OpenJDK Runtime Environment Temurin-17.0.4.1+1 (build 17.0.4.1+1)
OpenJDK 64-Bit Server VM Temurin-17.0.4.1+1 (build 17.0.4.1+1, mixed mode, sharing)

It fails with every pipeline, for example:

nextflow run \
    nf-core/sarek \
    -profile test \
    -c ~/configs/conf/azurebatch.config \
    -ansi-log false \
    -w az://work/ \
    --az_location $AZURE_BATCH_LOCATION \
    --batch_name $AZURE_BATCH_ACCOUNT_NAME \
    --batch_key $AZURE_BATCH_ACCOUNT_KEY \
    --storage_name $AZURE_STORAGE_ACCOUNT_NAME \
    --storage_key $AZURE_STORAGE_ACCOUNT_KEY

adamrtalbot avatar Oct 26 '22 13:10 adamrtalbot

@adamrtalbot I just tested this config and had no issues running it as is.

  N E X T F L O W
  version 22.10.0 build 5826
  created 13-10-2022 05:44 UTC (00:44 CDT)
  cite doi:10.1038/nbt.3820
  http://nextflow.io

The scaling is targeting low priority nodes, by default the autoscale would use dedicated nodes. So I wonder if you look at quotas in your batch account under Total dedicated vCPUs and Dv3 Series section to see if its 0. If its 0 then I think that would be the issue.

vsmalladi avatar Oct 27 '22 21:10 vsmalladi

@vsmalladi if you had no issues, shall we just go for it?

adamrtalbot avatar Nov 01 '22 18:11 adamrtalbot

@adamrtalbot ya lets merge and see if anyone brings up any issues that we can solve.

vsmalladi avatar Nov 01 '22 18:11 vsmalladi