Azure Batch: Add disk size to slots calculation
New feature
When using Azure Batch, Nextflow will reject a process if it has too many CPUs for the worker machine.
Caused by:
Process requirement exceeds available CPUs -- req: 32; avail: 10
However, Azure Batch VMs come with a fixed disk and it's common that the Nextflow process runs out of storage. There are many, many issues about this on the Nextflow Slack! The typical workaround is to increase the number of CPUs an individual process requires, however it would be better to support the disk directive so we can directly enforce the VMs have the right sized disk.
Although we can't enforce it properly (i.e. make sure tasks are only assigned to a VM with enough space), being able to prevent users trying to run a task on a machine which is too small would catch some of the issues.
Usage scenario
When running on Azure Batch, raise an error if a task is assigned to a queue which does not contain sufficient storage.
Suggest implementation
process HELLO {
disk 12.TB
"""
echo Hello
"""
}
workflow {
HELLLO()
}
Caused by:
Process requirement exceeds available storage -- req: 10TB; avail: 1TB
how would you determine the available disk storage for a given VM / queue?
I'm thinking way simpler than that.
if process.disk = '1024GB' and VM disk size is 512GB (we have the disk size) then prevent a job being submitted to that queue. It's not foolproof, but it would catch errors early.
I see, so if the user specifies a machine type or a queue with an implied machine type then we can use disk to validate the requirement, but if the user specifies cpus and memory with auto-pools enabled then we could actually use disk to narrow the query of valid machine types from this list. Does that sound right?
I see, so if the user specifies a machine type or a queue with an implied machine type then we can use disk to validate the requirement, but if the user specifies cpus and memory with auto-pools enabled then we could actually use disk to narrow the query of valid machine types from this list. Does that sound right?
Correct. It's not perfect, but it might help catch a few mistakes.
We would need to add the disk directive to these three places: https://github.com/nextflow-io/nextflow/blob/735fb8bb4be6bc517500188bec86dc4fe2348225/plugins/nf-azure/src/main/nextflow/cloud/azure/batch/AzBatchService.groovy#L565 https://github.com/nextflow-io/nextflow/blob/735fb8bb4be6bc517500188bec86dc4fe2348225/plugins/nf-azure/src/main/nextflow/cloud/azure/batch/AzBatchService.groovy#L158-L176 https://github.com/nextflow-io/nextflow/blob/54ad624162f78f66ea8562fc6941442404672648/plugins/nf-azure/src/main/nextflow/cloud/azure/batch/AzBatchService.groovy#L158-L176
Better idea, we just turn the disk size into one of the compute slots. E.g., a job that requires 1 cpu, 1gb of memory and 128gb of storage on a machine with 16 cores, 64gb of memory and 256gb of storage would currently occupy 1 slot. If we update the system it will occupy 8/16 slots. See relevant code here:
https://github.com/nextflow-io/nextflow/blob/c713ad510b76e6483d16292b4a66f6eb05773d36/plugins/nf-azure/src/main/nextflow/cloud/azure/batch/AzBatchService.groovy#L245-L269
Sounds good to me. Care to give it a go? 😄 You have everything you need from the TaskRun and AzVmPoolSpec which are provided to the slots function
Post summit. Maybe on the plane 😆
AzureBatch pool VM's can be created with whatever OS disk size required. I created this one with 1TB: Filesystem Size Used Avail Use% Mounted on /dev/root 969G 2.9G 967G 1% /
My issue is that azurebatch does not respect the disk directive. My containers are created with 98GB, I need to create with 200GB: Filesystem Size Used Avail Use% Mounted on /dev/sdb1 98G 2.5G 91G 3% /mnt/batch/tasks
I allocated 1TB in my pool - that should be configurable in azure.batch... it is in google.batch. I had to manually create a pool using an ARM template.
How do I create my container with /mnt/batch/tasks = 200GB ?
It seems to me underlying configuration issues should be addressed in the batch section of config. Creating your own pool is quite time-consuming. I had to let nextflow create the pool and then export the ARM template to understand settings. Then add to the arm template:
...
"deploymentConfiguration": {
"virtualMachineConfiguration": {
...
"osDisk": {
"diskSizeGB": 1000
}
Letting nextflow do it for you is much easier by setting autopool=true and deletePoolsOnCompletion = false. Then you have a leftover pool for use by Tower. If we had the settings as part of batch, ie:
batch {
allowPoolCreation = true
autoPoolMode = true
deletePoolsOnCompletion = false
auto {
autoScale = true
vmType = 'Standard_D2_v3'
vmCount = 3
maxVmCount = 3
lowPriority = true
disk = 2000
}
The pool vm's would be created with whatever size batch.{pool}.disk is desired. Google Batch uses batch.bootDiskSize = 200.GB, naming should be consistent with other executors-not sure if google is.
Separately, process.disk should be respected as a process directive, because that is how nextflow works for other modes. How else am I supposed to create a Container with more than 98GB ?
if you need to know the disk size, ask the VM or ask the Azure batch pool how big the VM's disk is.
I think if you are trying to determine if there is enough VM storage to support process.disk=###GB, then the simplest way is to ask the VM how much storage is available. nextflow presumable communicates wiith the VM's so why not simply interrogate the VM how much storage is available. If that interrogate the VM does not work, nextflow has the batch credentials, so can interrogate batch how large the OS disk is for a VM, but that does not tell you how much is in use.
Hi @qup9, you're absolutely right.
We have 2 options:
- Support Azure VM size a directive for the Azure Batch disk directive. Until recently this wasn't possible with the Azure Batch SDK but it looks like it might be now.
- I have a pull request open for respecting the disk directive although the logic isn't quite correct, I'd appreciate any input to fixing it: https://github.com/nextflow-io/nextflow/pull/5120
I created a machine of Standard_e4ds_v5 type with a 512Gb disk and ssh'd onto it, as you can see the Azure Batch worker directory is still only 150Gb (the base storage of the disk):
batch-explorer-user@e427c9e54ac049a3b5146d32b2227f87000000:~$ lsblk
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
loop0 7:0 0 64M 1 loop /snap/core20/2264
loop1 7:1 0 91.9M 1 loop /snap/lxd/24061
loop2 7:2 0 38.8M 1 loop /snap/snapd/21465
sda 8:0 0 512G 0 disk
├─sda1 8:1 0 511.9G 0 part /
├─sda14 8:14 0 4M 0 part
└─sda15 8:15 0 106M 0 part /boot/efi
sdb 8:16 0 150G 0 disk
└─sdb1 8:17 0 150G 0 part /mnt
sr0 11:0 1 628K 0 rom
Still, if your docker container is 1tb (please tell me that's not true!) then this will help you increase the OS disk size. this means the storage isn't being used for the Azure Batch storage (/mnt) and remapping it causes issues. This is a limitation of Azure Batch they need to fix if they want to be a serious competitor to AWS and GCP.
OK I've worked out how you can use the OS Disk.
If you make an Azure Batch pool with an OS disk of 1023Gb (largest size possible), then you can modify the start task to be this:
bash -c "chmod +x azcopy && mkdir -p ${AZ_BATCH_NODE_SHARED_DIR}/bin/ && cp azcopy ${AZ_BATCH_NODE_SHARED_DIR}/bin/ && curl -s https://gist.githubusercontent.com/adamrtalbot/c20dfcb47bd92d92bee74606eb707521/raw/a9e3418b860cc79913b37eeffddae9990539cae3/remount.sh | bash -s"
The startTask is basically running this script, which swaps out the OS disk for the storage disk at /mnt:
rsync -avz /mnt/ /tmp/batch
umount -lf /mnt
mv /tmp/batch /mnt
Is this a good idea? Almost certainly not. Does it work? Not really, it breaks almost all the logging and settings.
Let me clarify: (from ARM Template) By creating the pool and setting the pool as: ... "deploymentConfiguration": { "virtualMachineConfiguration": { ... "osDisk": { "diskSizeGB": 1000 }
Each VM in the pool receives 1000GB disk, regardless of VMType. This should be configurable in batch.{pool}.disk (or similar) config parameter.
The containers spawned by a pool VM receive storage set at: /mnt/batch/tasks based on some calculation of the VMType. I have observed: STANDARD_D2_V3 => 49G STANDARD_D4_V3 => 98G
How is process storage size being set. It should be set based in process.disk in config. Is it hard coded in Nextflow somewhere, instead ? Can you point me to files/methods where this is taking place-I am not a Groovy developer.
From: Adam Talbot @.> Sent: Thursday, August 22, 2024 4:45 AM To: nextflow-io/nextflow @.> Cc: Longo, Joseph (CDC/NCIRD/DVD) (CTR) @.>; Mention @.> Subject: Re: [nextflow-io/nextflow] Azure Batch: Add disk size to slots calculation (Issue #4920)
CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you recognize the sender and know the content is safe.
I created a machine of Standard_e4ds_v5 type with a 512Gb disk and ssh'd onto it, as you can see the Azure Batch worker directory is still only 150Gb (the base storage of the disk):
@.***:~$ lsblk
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
loop0 7:0 0 64M 1 loop /snap/core20/2264
loop1 7:1 0 91.9M 1 loop /snap/lxd/24061
loop2 7:2 0 38.8M 1 loop /snap/snapd/21465
sda 8:0 0 512G 0 disk
├─sda1 8:1 0 511.9G 0 part /
├─sda14 8:14 0 4M 0 part
└─sda15 8:15 0 106M 0 part /boot/efi
sdb 8:16 0 150G 0 disk
└─sdb1 8:17 0 150G 0 part /mnt
sr0 11:0 1 628K 0 rom
Still, if your docker container is 1tb (please tell me that's not true!) then this will help you increase the OS disk size. this means the storage isn't being used for the Azure Batch storage (/mnt) and remapping it causes issues. This is a limitation of Azure Batch they need to fix if they want to be a serious competitor to AWS and GCP.
— Reply to this email directly, view it on GitHubhttps://github.com/nextflow-io/nextflow/issues/4920#issuecomment-2304113130, or unsubscribehttps://github.com/notifications/unsubscribe-auth/A564CKUWZWCQ4FNINSMIVN3ZSWQJ7AVCNFSM6AAAAABGI7OWG6VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDGMBUGEYTGMJTGA. You are receiving this because you were mentioned.Message ID: @.@.>>
How is process storage size being set. It should be set based in process.disk in config. Is it hard coded in Nextflow somewhere, instead ? Can you point me to files/methods where this is taking place-I am not a Groovy developer.
It's not a Nextflow limitation, it's from Azure Batch. In Azure Batch it is impossible to modify the size of the resource disk, whatever comes with the VM size is what you get. The OS disk which you refer to in the ARM is not used by Azure Batch at all.
My understanding of how azurebatch is working must be wrong…
It appears that physical VMs are created based on VMType. diskSizeGB controls the disk size of those VM’s I can connect to the node and verify the size used. df -h Filesystem Size Used Avail Use% Mounted on /dev/root 969G 2.9G 967G 1% /
Within one node, docker appears to be running the Nextflow head as a container. sudo docker ps -a CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES d20c99463fd7 nextflow384688088.azurecr.io/az_pass:latest "sh -c 'bash .comman…" About a minute ago Up About a minute elated_leavitt cbfaf382f653 quay.io/seqeralabs/nf-launcher:j17-24.04.4 "/usr/local/bin/nf-l…" 2 hours ago Up 2 hours romantic_sammet
I can connect to the node and use docker commands and list running containers.
After that is fuzzier… The Nextflow head appears to launch additional containers for each process placing them on VM instances(nodes) as appropriate. sudo docker ps -a CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES 0abf6bd53f8a xxx.azurecr.io/az_pass:latest "sh -c 'bash .comman…" 6 seconds ago Up 4 seconds jolly_blackburn f3f03afc6f2c xxx.azurecr.io/az_pass:latest "sh -c 'bash .comman…" 2 minutes ago Up 2 minutes pensive_wiles 30082863d1e4 xxx.azurecr.io/az_pass:latest "sh -c 'bash .comman…" 2 minutes ago Exited (0) 10 seconds ago suspicious_fermi 5057447b85e6 xxx.azurecr.io/az_pass:latest "sh -c 'bash .comman…" 47 minutes ago Exited (0) 47 minutes ago amazing_einstein What is not clear is how a process container is launched.
Please explain/correct my thinking.
From: Adam Talbot @.> Sent: Thursday, August 22, 2024 11:56 AM To: nextflow-io/nextflow @.> Cc: Longo, Joseph (CDC/NCIRD/DVD) (CTR) @.>; Mention @.> Subject: Re: [nextflow-io/nextflow] Azure Batch: Add disk size to slots calculation (Issue #4920)
CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you recognize the sender and know the content is safe.
How is process storage size being set. It should be set based in process.disk in config. Is it hard coded in Nextflow somewhere, instead ? Can you point me to files/methods where this is taking place-I am not a Groovy developer.
It's not a Nextflow limitation, it's from Azure Batch. In Azure Batch it is impossible to modify the size of the resource disk, whatever comes with the VM size is what you get. The OS disk which you refer to in the ARM is not used by Azure Batch at all.
— Reply to this email directly, view it on GitHubhttps://github.com/nextflow-io/nextflow/issues/4920#issuecomment-2305114223, or unsubscribehttps://github.com/notifications/unsubscribe-auth/A564CKUUBIRWOQZ35EF66VDZSYCZVAVCNFSM6AAAAABGI7OWG6VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDGMBVGEYTIMRSGM. You are receiving this because you were mentioned.Message ID: @.@.>>
Correct, when using Seqera Platform the Azure
Nextflow is submitting work the the Azure Batch API. It schedules a job (queue) and task per instance of a process which the Azure Batch service will assign to an available virtual machine. Nextflow will watch for task completion before proceeding with the pipeline.
It is the Azure Batch service which launches and manages the docker containers. Each task specification will include a container and a command line invocation to be triggered. You can watch this happen in real time using the Azure Batch explorer.
Now the storage part is this, Azure Batch will create unique sub directories to perform the work in, to keep everything unique and isolated. It does this within the /mnt/batch/tasks/ directory.
In my example above, you can see an OS disk mounted at / (sda1) and the Azure Batch working disk located at /mnt (sdb1). From this, you can see you could modify the size of sda1 as much as you would like but it would never affect the size of the working directory for Azure Batch, which is where the Docker images and working directories are stored.
$ lsblk
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
loop0 7:0 0 64M 1 loop /snap/core20/2264
loop1 7:1 0 91.9M 1 loop /snap/lxd/24061
loop2 7:2 0 38.8M 1 loop /snap/snapd/21465
sda 8:0 0 512G 0 disk
├─sda1 8:1 0 511.9G 0 part /
├─sda14 8:14 0 4M 0 part
└─sda15 8:15 0 106M 0 part /boot/efi
sdb 8:16 0 150G 0 disk
└─sdb1 8:17 0 150G 0 part /mnt
sr0 11:0 1 628K 0 rom
Reviewing the pool template: https://learn.microsoft.com/en-us/azure/templates/microsoft.batch/batchaccounts/pools?pivots=deployment-language-arm-template
I see a section: "dataDisks": [ { "caching": "string", "diskSizeGB": "int", "lun": "int", "storageAccountType": "string" } ],
I think if a datadisk is included with lun= 1; diskSizeGB=1TB… then sdb1will end up 1TB
From: Adam Talbot @.> Sent: Thursday, August 22, 2024 1:06 PM To: nextflow-io/nextflow @.> Cc: Longo, Joseph (CDC/NCIRD/DVD) (CTR) @.>; Mention @.> Subject: Re: [nextflow-io/nextflow] Azure Batch: Add disk size to slots calculation (Issue #4920)
CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you recognize the sender and know the content is safe.
Correct, when using Seqera Platform the Azure
Nextflow is submitting work the the Azure Batch API. It schedules a job (queue) and task per instance of a process which the Azure Batch service will assign to an available virtual machine. Nextflow will watch for task completion before proceeding with the pipeline.
It is the Azure Batch service which launches and manages the docker containers. Each task specification will include a container and a command line invocation to be triggered. You can watch this happen in real time using the Azure Batch explorerhttps://azure.github.io/BatchExplorer/.
Now the storage part is this, Azure Batch will create unique sub directories to perform the work in, to keep everything unique and isolated. It does this within the /mnt/batch/tasks/ directory.
In my example above, you can see an OS disk mounted at / (sda1) and the Azure Batch working disk located at /mnt (sdb1). From this, you can see you could modify the size of sda1 as much as you would like but it would never affect the size of the working directory for Azure Batch, which is where the Docker images and working directories are stored.
$ lsblk
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
loop0 7:0 0 64M 1 loop /snap/core20/2264
loop1 7:1 0 91.9M 1 loop /snap/lxd/24061
loop2 7:2 0 38.8M 1 loop /snap/snapd/21465
sda 8:0 0 512G 0 disk
├─sda1 8:1 0 511.9G 0 part /
├─sda14 8:14 0 4M 0 part
└─sda15 8:15 0 106M 0 part /boot/efi
sdb 8:16 0 150G 0 disk
└─sdb1 8:17 0 150G 0 part /mnt
sr0 11:0 1 628K 0 rom
— Reply to this email directly, view it on GitHubhttps://github.com/nextflow-io/nextflow/issues/4920#issuecomment-2305242740, or unsubscribehttps://github.com/notifications/unsubscribe-auth/A564CKWQGLH555INS7HVPWTZSYK67AVCNFSM6AAAAABGI7OWG6VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDGMBVGI2DENZUGA. You are receiving this because you were mentioned.Message ID: @.@.>>
No, a data disk is mounted at sdc1 and not even formatted!
Trust me, I've tried 😭
Correction…
From: Longo, Joseph (CDC/NCIRD/DVD) (CTR) Sent: Thursday, August 22, 2024 1:16 PM To: nextflow-io/nextflow @.>; nextflow-io/nextflow @.> Cc: Mention @.***> Subject: RE: [nextflow-io/nextflow] Azure Batch: Add disk size to slots calculation (Issue #4920)
Reviewing the pool template: https://learn.microsoft.com/en-us/azure/templates/microsoft.batch/batchaccounts/pools?pivots=deployment-language-arm-template
I see a section: "dataDisks": [ { "caching": "string", "diskSizeGB": "int", "lun": "int", "storageAccountType": "string" } ], I think if a datadisk is included with lun= 0; diskSizeGB=1TB… then sdb1will end up 1TB Examining a node seems to show sda and sdb operating on LUN 0.
From: Adam Talbot @.@.>> Sent: Thursday, August 22, 2024 1:06 PM To: nextflow-io/nextflow @.@.>> Cc: Longo, Joseph (CDC/NCIRD/DVD) (CTR) @.@.>>; Mention @.@.>> Subject: Re: [nextflow-io/nextflow] Azure Batch: Add disk size to slots calculation (Issue #4920)
CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you recognize the sender and know the content is safe.
Correct, when using Seqera Platform the Azure
Nextflow is submitting work the the Azure Batch API. It schedules a job (queue) and task per instance of a process which the Azure Batch service will assign to an available virtual machine. Nextflow will watch for task completion before proceeding with the pipeline.
It is the Azure Batch service which launches and manages the docker containers. Each task specification will include a container and a command line invocation to be triggered. You can watch this happen in real time using the Azure Batch explorerhttps://azure.github.io/BatchExplorer/.
Now the storage part is this, Azure Batch will create unique sub directories to perform the work in, to keep everything unique and isolated. It does this within the /mnt/batch/tasks/ directory.
In my example above, you can see an OS disk mounted at / (sda1) and the Azure Batch working disk located at /mnt (sdb1). From this, you can see you could modify the size of sda1 as much as you would like but it would never affect the size of the working directory for Azure Batch, which is where the Docker images and working directories are stored.
$ lsblk
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
loop0 7:0 0 64M 1 loop /snap/core20/2264
loop1 7:1 0 91.9M 1 loop /snap/lxd/24061
loop2 7:2 0 38.8M 1 loop /snap/snapd/21465
sda 8:0 0 512G 0 disk
├─sda1 8:1 0 511.9G 0 part /
├─sda14 8:14 0 4M 0 part
└─sda15 8:15 0 106M 0 part /boot/efi
sdb 8:16 0 150G 0 disk
└─sdb1 8:17 0 150G 0 part /mnt
sr0 11:0 1 628K 0 rom
— Reply to this email directly, view it on GitHubhttps://github.com/nextflow-io/nextflow/issues/4920#issuecomment-2305242740, or unsubscribehttps://github.com/notifications/unsubscribe-auth/A564CKWQGLH555INS7HVPWTZSYK67AVCNFSM6AAAAABGI7OWG6VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDGMBVGI2DENZUGA. You are receiving this because you were mentioned.Message ID: @.@.>>
does lun=0 end up as sdc or sdb ?
Do you control the startup sequence of the nodes ? If so, can’t you format whatever datadisk is created and then mount it to /mnt ? Proposed sequence: Batch.{pool}.disk = 1000.G During pool creation, datadisk is added of size Batch.{pool}.disk During node initializing: datadisk is formatted datadisk is mounted to /mnt
From: Adam Talbot @.> Sent: Thursday, August 22, 2024 1:26 PM To: nextflow-io/nextflow @.> Cc: Longo, Joseph (CDC/NCIRD/DVD) (CTR) @.>; Mention @.> Subject: Re: [nextflow-io/nextflow] Azure Batch: Add disk size to slots calculation (Issue #4920)
CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you recognize the sender and know the content is safe.
No, a data disk is mounted at sdc1 and not even formatted!
Trust me, I've tried 😭
— Reply to this email directly, view it on GitHubhttps://github.com/nextflow-io/nextflow/issues/4920#issuecomment-2305278201, or unsubscribehttps://github.com/notifications/unsubscribe-auth/A564CKSWQCH4ICH67ECWRR3ZSYNIVAVCNFSM6AAAAABGI7OWG6VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDGMBVGI3TQMRQGE. You are receiving this because you were mentioned.Message ID: @.@.>>
Give it a go 😉 If you manage to get something working I would be very interested and I'm sure we can recreate it in Nextflow code.
After some hacking, I can see the problem (I may have some details wrong): Batch uses the Temp disk of each node to host container data under /mnt Nodes are sized with fixed resource disks (the Temp Disk). Choices can be identified here: https://azureprice.net/ (export as csv and use excel for details not visible on page) This will cause issues for my work as even if I assign 4 core VM’s to the nodes, the cost is 5X or more if I need more than 100GB temp disk. Given a 5GB accession requires ~ 100GB to process via fasterq_dump, processing larger files in azure will become very expensive.
Jobs are actually created on each node. Nextflow uses the workitems folder for writing local data before pushing to storageaccounts. /mnt has various content for docker, etc. The actual folder used on the node by the process containers is: /mnt/batch/tasks/workitems /mnt/batch/tasks/workitems is empty at startup, so we should be able to move it.
Set a disksize option in batch for sba1 Create /mnt/batch/tasks/workitems (which already exists ???) Create /worktitems on sba1 Mount /mnt/batch/tasks/workitems to /worktitems Set correct permissions I tried creating a script to mount /mnt/batch/tasks/workitems to a folder on OS it did not work. (My linux is not that good. I think someone who knows linux should be able to):
From: Adam Talbot @.> Sent: Thursday, August 22, 2024 1:26 PM To: nextflow-io/nextflow @.> Cc: Longo, Joseph (CDC/NCIRD/DVD) (CTR) @.>; Mention @.> Subject: Re: [nextflow-io/nextflow] Azure Batch: Add disk size to slots calculation (Issue #4920)
CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you recognize the sender and know the content is safe.
No, a data disk is mounted at sdc1 and not even formatted!
Trust me, I've tried 😭
— Reply to this email directly, view it on GitHubhttps://github.com/nextflow-io/nextflow/issues/4920#issuecomment-2305278201, or unsubscribehttps://github.com/notifications/unsubscribe-auth/A564CKSWQCH4ICH67ECWRR3ZSYNIVAVCNFSM6AAAAABGI7OWG6VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDGMBVGI3TQMRQGE. You are receiving this because you were mentioned.Message ID: @.@.>>
Set a disksize option in batch for sba1 Create /mnt/batch/tasks/workitems (which already exists ???) Create /worktitems on sba1 Mount /mnt/batch/tasks/workitems to /worktitems Set correct permissions I tried creating a script to mount /mnt/batch/tasks/workitems to a folder on OS it did not work. (My linux is not that good. I think someone who knows linux should be able to):
This is exactly what I attempted here: https://github.com/nextflow-io/nextflow/issues/4920#issuecomment-2304738273
It was definitely a hack and didn't work very well 🤦 .
@vsmalladi this is exactly what I was talking about. If the Azure Batch service gave us an API to configure the disk size we wouldn't have to spend 4x the appropriate cost
This will cause issues for my work as even if I assign 4 core VM’s to the nodes, the cost is 5X or more if I need more than 100GB temp disk. Given a 5GB accession requires ~ 100GB to process via fasterq_dump, processing larger files in azure will become very expensive.
A Standard_D4d_v5 comes with a 150gb disk and 4 cores. If you need more memory, a Standard_E4d_v5 is the same but has double the memory. I generally recommend using Standard_E*d_v5 machines for most bioinformatics work. The d suffix indicates it comes with a disk, compared to a Standard_D4_v5 which does not.
Vantage now supports Azure which gives you a good overview of the machines: https://instances.vantage.sh/azure/
Here is the ammo: to get Azure to change the implementation: The largest 4 core resourceDisk is 600GB costing ~0.10/hr; one with no resource disk is ~ 0.02/hr fasterq_dump typically requires approx. 18x the acc size. If I start with an acc of 20G – not uncommon – I need 360GB on resource disk to complete. If I have 2 jobs running on the node (I don’t need 4 cores for fasterq dump) I run out of space. If I have 35GB acc, I run out of space. Meanwhile, in both cases the OS disk is sitting there with 1TB or more of unused storage.
I think the implementation has to change in that: Nextflow requests disk space/process and launches job requesting disk space. Batch creates and mounts a disk for each job assigning that disk as /workitems This would end up with results to Google Cloud’s implementation where this is not an issue.
From: Adam Talbot @.> Sent: Friday, August 30, 2024 3:52 AM To: nextflow-io/nextflow @.> Cc: Longo, Joseph (CDC/NCIRD/DVD) (CTR) @.>; Mention @.> Subject: Re: [nextflow-io/nextflow] Azure Batch: Add disk size to slots calculation (Issue #4920)
CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you recognize the sender and know the content is safe.
Set a disksize option in batch for sba1 Create /mnt/batch/tasks/workitems (which already exists ???) Create /worktitems on sba1 Mount /mnt/batch/tasks/workitems to /worktitems Set correct permissions I tried creating a script to mount /mnt/batch/tasks/workitems to a folder on OS it did not work. (My linux is not that good. I think someone who knows linux should be able to):
This is exactly what I attempted here: #4920 (comment)https://github.com/nextflow-io/nextflow/issues/4920#issuecomment-2304738273
It was definitely a hack and didn't work very well 🤦 .
@vsmalladihttps://github.com/vsmalladi this is exactly what I was talking about. If the Azure Batch service gave us an API to configure the disk size we wouldn't have to spend 4x the appropriate cost
This will cause issues for my work as even if I assign 4 core VM’s to the nodes, the cost is 5X or more if I need more than 100GB temp disk. Given a 5GB accession requires ~ 100GB to process via fasterq_dump, processing larger files in azure will become very expensive.
A Standard_D4d_v5https://learn.microsoft.com/en-us/azure/virtual-machines/sizes/general-purpose/ddv5-series?tabs=sizestoragelocal comes with a 150gb disk and 4 cores. If you need more memory, a Standard_E4d_v5https://learn.microsoft.com/en-us/azure/virtual-machines/sizes/memory-optimized/edv5-series?tabs=sizebasic is the same but has double the memory. I generally recommend using Standard_E*d_v5 machines for most bioinformatics work. The d suffix indicates it comes with a disk, compared to a Standard_D4_v5https://learn.microsoft.com/en-us/azure/virtual-machines/sizes/general-purpose/dv5-series?tabs=sizebasic which does not.
Vantage now supports Azure which gives you a good overview of the machines: https://instances.vantage.sh/azure/
— Reply to this email directly, view it on GitHubhttps://github.com/nextflow-io/nextflow/issues/4920#issuecomment-2320389300, or unsubscribehttps://github.com/notifications/unsubscribe-auth/A564CKWGSK5RNRN4SCHRF7DZUAQAZAVCNFSM6AAAAABGI7OWG6VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDGMRQGM4DSMZQGA. You are receiving this because you were mentioned.Message ID: @.@.>>
New theory: Use a Vm WITHOUT a resource disk, ie Standard_D2_v5 or Standard_D4_v4 My azure account is dead till end of month or I would try it myself. We can enlarge the OS disk in the batch pool template, but it defaults to 1TB, which would be plenty for my needs. Notes suggest performance would be impacted, but probably not as bad as not working. If this works, 4 tasks could run with .5 cpu without concern of using up the disk.
Per my research: Mounting Behavior on a VM Without a Resource Disk
-
Default Mount Location: When you mount a directory like /mnt/batch/tasks inside a Docker container, the mount point corresponds to a location on the host VM's filesystem. If the VM lacks a resource disk, there is no separate disk available for temporary storage, so the mount point will reside on the OS disk.
-
OS Disk Usage: This means that any data written to /mnt/batch/tasks in the Docker container will consume space on the OS disk. Since the OS disk is typically smaller and not designed for high I/O operations like a resource disk, this could impact performance, especially if the tasks are I/O intensive or generate a lot of temporary data. Considerations
-
Performance: Writing to the OS disk might not be as fast as using a resource disk, potentially leading to slower performance for tasks that rely heavily on temporary storage.
-
Disk Space: The OS disk has limited space, and depending on the size of your tasks, you might quickly run out of space if the data written to /mnt/batch/tasks is substantial.
-
Best Practices: If you need significant temporary storage and your chosen VM type lacks a resource disk, consider using an Azure Managed Disk, Azure Files, or Blob Storage as an alternative, although this may involve some trade-offs in terms of cost and performance.
From: Adam Talbot @.> Sent: Friday, August 30, 2024 3:52 AM To: nextflow-io/nextflow @.> Cc: Longo, Joseph (CDC/NCIRD/DVD) (CTR) @.>; Mention @.> Subject: Re: [nextflow-io/nextflow] Azure Batch: Add disk size to slots calculation (Issue #4920)
CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you recognize the sender and know the content is safe.
Set a disksize option in batch for sba1 Create /mnt/batch/tasks/workitems (which already exists ???) Create /worktitems on sba1 Mount /mnt/batch/tasks/workitems to /worktitems Set correct permissions I tried creating a script to mount /mnt/batch/tasks/workitems to a folder on OS it did not work. (My linux is not that good. I think someone who knows linux should be able to):
This is exactly what I attempted here: #4920 (comment)https://github.com/nextflow-io/nextflow/issues/4920#issuecomment-2304738273
It was definitely a hack and didn't work very well 🤦 .
@vsmalladihttps://github.com/vsmalladi this is exactly what I was talking about. If the Azure Batch service gave us an API to configure the disk size we wouldn't have to spend 4x the appropriate cost
This will cause issues for my work as even if I assign 4 core VM’s to the nodes, the cost is 5X or more if I need more than 100GB temp disk. Given a 5GB accession requires ~ 100GB to process via fasterq_dump, processing larger files in azure will become very expensive.
A Standard_D4d_v5https://learn.microsoft.com/en-us/azure/virtual-machines/sizes/general-purpose/ddv5-series?tabs=sizestoragelocal comes with a 150gb disk and 4 cores. If you need more memory, a Standard_E4d_v5https://learn.microsoft.com/en-us/azure/virtual-machines/sizes/memory-optimized/edv5-series?tabs=sizebasic is the same but has double the memory. I generally recommend using Standard_E*d_v5 machines for most bioinformatics work. The d suffix indicates it comes with a disk, compared to a Standard_D4_v5https://learn.microsoft.com/en-us/azure/virtual-machines/sizes/general-purpose/dv5-series?tabs=sizebasic which does not.
Vantage now supports Azure which gives you a good overview of the machines: https://instances.vantage.sh/azure/
— Reply to this email directly, view it on GitHubhttps://github.com/nextflow-io/nextflow/issues/4920#issuecomment-2320389300, or unsubscribehttps://github.com/notifications/unsubscribe-auth/A564CKWGSK5RNRN4SCHRF7DZUAQAZAVCNFSM6AAAAABGI7OWG6VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDGMRQGM4DSMZQGA. You are receiving this because you were mentioned.Message ID: @.@.>>