nextflow icon indicating copy to clipboard operation
nextflow copied to clipboard

Use `ephemeral-storage` for `disk` directive and `k8s` executor

Open bentsherman opened this issue 2 years ago • 4 comments

Closes #2986

This PR includes the following changes:

  • add support for emptyDir ephemeral volume mounts in k8s
  • add support for disk directive in k8s via ephemeral-storage resource type and emptyDir volume

bentsherman avatar Jun 27 '22 21:06 bentsherman

I need to test it to make sure, but based on the K8s docs, this PR should make it possible to request local disk storage for k8s tasks. You have to specify both the disk directive and the emptyDir pod option in your Nextflow config:

process {
    disk = 10.GB
    pod = [ [emptyDir: [:], mountPath: '/scratch'] ]
    scratch = '/scratch' // direct tasks to use the local storage
}

As a side bonus, you can also use emptyDir backed by memory:

process {
    memory = 10.GB
    pod = [ [emptyDir: [medium: 'Memory'], mountPath: '/scratch'] ]
    scratch = '/scratch'
}

bentsherman avatar Jun 27 '22 22:06 bentsherman

Hello, this seems to be a must for k8s 1.24 where under some circumstances, ephemeral-storage is enforced.

The node was low on resource: ephemeral-storage. Container nf-7884aa3757d610b56f730720f484453f was using 61810400Ki, which exceeds its request of 0.

@xhejtman I think I have seen this issue occasionally for earlier versions as well, but I haven't figured out why it happens. It seems to happen when the underlying node storage is exhausted, regardless of how much ephemeral-storage was allocated.

bentsherman avatar Sep 12 '22 13:09 bentsherman

@xhejtman I think I have seen this issue occasionally for earlier versions as well, but I haven't figured out why it happens. It seems to happen when the underlying node storage is exhausted, regardless of how much ephemeral-storage was allocated.

yes, this was my fault - node was by mistake configured without enough of local storage and therefore I discarded the comment. But it can happen, of course, under normal circumstances as well. More interresting question is, why this problem led to pipeline fail instead of re-run this task.

xhejtman avatar Sep 12 '22 13:09 xhejtman

Feel free to submit an issue for it. If you can identify the exception that was thrown, then we can check the error handling logic and possibly change it to trigger a retry instead of workflow termination.

bentsherman avatar Sep 12 '22 13:09 bentsherman

I'm not getting what's the difference between this and https://github.com/nextflow-io/nextflow/pull/2988?

pditommaso avatar Nov 13 '22 21:11 pditommaso

The other PR adds support for the csi ephemeral volume type, whereas this PR adds support for the emptyDir ephemeral volume type (and uses it to implement the disk directive for K8s).

Basically these PRs add two different ephemeral volume types. Ephemeral volumes must be added on a case-by-case basis because you can't provision them with a PVC. ConfigMap and Secret are two ephemeral volume types that we already support, so these new types are supported in much the same way. I think both are worth supporting because csi allows you to use remote secret vaults and emptyDir allows you to allocate per-task scratch storage.

bentsherman avatar Nov 14 '22 00:11 bentsherman

Ok, i've merged with master and solved the conflicts. @bentsherman please check everything is fine c353d8aad448d4ffdc3f0816fcc27c277fc0b115

pditommaso avatar Nov 14 '22 18:11 pditommaso

@pditommaso Everything looks good, test worked, this one is ready to merge.

bentsherman avatar Nov 16 '22 17:11 bentsherman

I need to test it to make sure, but based on the K8s docs, this PR should make it possible to request local disk storage for k8s tasks. You have to specify both the disk directive and the emptyDir pod option in your Nextflow config:

process {
    disk = 10.GB
    pod = [ [emptyDir: [:], mountPath: '/scratch'] ]
    scratch = '/scratch' // direct tasks to use the local storage
}

As a side bonus, you can also use emptyDir backed by memory:

process {
    memory = 10.GB
    pod = [ [emptyDir: [medium: 'Memory'], mountPath: '/scratch'] ]
    scratch = '/scratch'
}

Hi Sorry but is this now fully supported ? I tried running a process with:

   disk = 10.GB
    pod  emptyDir: [:]
    pod  mountPath: '/scratch'

added in the specific process but it returned


Caused by:
  Unknown pod options: [emptyDir:[:]]

using nextflow version 22.10.5.5840

ebioman avatar Jan 18 '23 14:01 ebioman

Hi @ebioman , it was not included in v22.10. Based on the merge commit, it is available in all of the edge releases since then. I would just use the latest edge release.

bentsherman avatar Jan 18 '23 15:01 bentsherman

Ah sorry missed that 🙂 thanks a lot for the info. Will definitely try then the edge release 👍

ebioman avatar Jan 18 '23 17:01 ebioman