nextflow icon indicating copy to clipboard operation
nextflow copied to clipboard

pbspro: ignore resource directives if -l option is specified

Open bentsherman opened this issue 1 year ago • 4 comments

Closes #2264

Alternative solution to #2971

This PR changes the pbspro executor to ignore cpus and memory directives if clusterOptions contains -l. That way, you can do something like:

clusterOptions = { "-l select=1:ncpus=${task.cpus}:mem=${task.memory.toMega()}mb:..." }

Currently this doesn't work because Nextflow generates two -l lines which is invalid. With this PR, Nextflow will only generate the one -l line from clusterOptions.

Could be useful for the other grid executors as well.

bentsherman avatar Jul 07 '22 15:07 bentsherman

Probably should apply this change to all of the grid executors at once so that clusterOptions is consistent. I will review the other executors to see which ones should be updated.

bentsherman avatar Jul 15 '22 03:07 bentsherman

Upon review, none of the other grid executors seems to have this restriction of one resource list, so I think we should go ahead and merge this one.

bentsherman avatar Jul 18 '22 19:07 bentsherman

pkg:maven/io.nextflow/[email protected]

1 Critical, 9 Severe, 0 Moderate, 0 Unknown vulnerabilities have been found across 1 dependencies

Components
    pkg:maven/io.nextflow/[email protected]
      CRITICAL Vulnerabilities (1)

        Uncontrolled Resource Consumption

        moment is a JavaScript date library for parsing, validating, manipulating, and formatting dates. Affected versions of moment were found to use an inefficient parsing algorithm. Specifically using string-to-date parsing in moment (more specifically rfc2822 parsing, which is tried by default) has quadratic (N^2) complexity on specific inputs. Users may notice a noticeable slowdown is observed with inputs above 10k characters. Users who pass user-provided strings without sanity length checks to moment constructor are vulnerable to (Re)DoS attacks. The problem is patched in 2.29.4, the patch can be applied to all affected versions with minimal tweaking. Users are advised to upgrade. Users unable to upgrade should consider limiting date lengths accepted from user input.

        CVSS Score: 7.5

        CVSS Vector: CVSS:3.1/AV:N/AC:L/PR:N/UI:N/S:U/C:N/I:N/A:H

        CWE: CWE-400

      SEVERE Vulnerabilities (9)
        sonatype-2021-0900

        Improper Neutralization of Input During Web Page Generation ('Cross-site Scripting')

        CVSS Score: 6.5

        CVSS Vector: CVSS:3.1/AV:N/AC:L/PR:N/UI:R/S:U/C:N/I:H/A:N

        CWE: CWE-79

        CVE-2018-14040

        Improper Neutralization of Input During Web Page Generation ('Cross-site Scripting')

        In Bootstrap before 4.1.2, XSS is possible in the collapse data-parent attribute.

        CVSS Score: 6.1

        CVSS Vector: CVSS:3.0/AV:N/AC:L/PR:N/UI:R/S:C/C:L/I:L/A:N

        CWE: CWE-79

        CVE-2018-14041

        Improper Neutralization of Input During Web Page Generation ('Cross-site Scripting')

        In Bootstrap before 4.1.2, XSS is possible in the data-target property of scrollspy.

        CVSS Score: 6.1

        CVSS Vector: CVSS:3.0/AV:N/AC:L/PR:N/UI:R/S:C/C:L/I:L/A:N

        CWE: CWE-79

        CVE-2018-14042

        Improper Neutralization of Input During Web Page Generation ('Cross-site Scripting')

        In Bootstrap before 4.1.2, XSS is possible in the data-container property of tooltip.

        CVSS Score: 6.1

        CVSS Vector: CVSS:3.0/AV:N/AC:L/PR:N/UI:R/S:C/C:L/I:L/A:N

        CWE: CWE-79

        CVE-2019-11358

        Improperly Controlled Modification of Object Prototype Attributes ('Prototype Pollution')

        jQuery before 3.4.0, as used in Drupal, Backdrop CMS, and other products, mishandles jQuery.extend(true, {}, ...) because of Object.prototype pollution. If an unsanitized source object contained an enumerable proto property, it could extend the native Object.prototype.

        CVSS Score: 6.1

        CVSS Vector: CVSS:3.1/AV:N/AC:L/PR:N/UI:R/S:C/C:L/I:L/A:N

        CWE: CWE-1321

        CVE-2019-8331

        Improper Neutralization of Input During Web Page Generation ('Cross-site Scripting')

        In Bootstrap before 3.4.1 and 4.3.x before 4.3.1, XSS is possible in the tooltip or popover data-template attribute.

        CVSS Score: 6.1

        CVSS Vector: CVSS:3.1/AV:N/AC:L/PR:N/UI:R/S:C/C:L/I:L/A:N

        CWE: CWE-79

        CVE-2020-11023

        Improper Neutralization of Input During Web Page Generation ('Cross-site Scripting')

        In jQuery versions greater than or equal to 1.0.3 and before 3.5.0, passing HTML containing <option> elements from untrusted sources - even after sanitizing it - to one of jQuery's DOM manipulation methods (i.e. .html(), .append(), and others) may execute untrusted code. This problem is patched in jQuery 3.5.0.

        CVSS Score: 6.1

        CVSS Vector: CVSS:3.1/AV:N/AC:L/PR:N/UI:R/S:C/C:L/I:L/A:N

        CWE: CWE-79

        sonatype-2018-0607

        Improper Neutralization of Input During Web Page Generation ('Cross-site Scripting')

        CVSS Score: 6.1

        CVSS Vector: CVSS:3.0/AV:N/AC:L/PR:N/UI:R/S:C/C:L/I:L/A:N

        CWE: CWE-79

        sonatype-2020-0187

        Improper Neutralization of Input During Web Page Generation ('Cross-site Scripting')

        CVSS Score: 6.1

        CVSS Vector: CVSS:3.1/AV:N/AC:L/PR:N/UI:R/S:C/C:L/I:L/A:N

        CWE: CWE-79

sonatype-lift[bot] avatar Aug 17 '22 12:08 sonatype-lift[bot]

pkg:maven/io.nextflow/[email protected]

1 Critical, 9 Severe, 0 Moderate, 0 Unknown vulnerabilities have been found across 1 dependencies

Components
    pkg:maven/io.nextflow/[email protected]
      CRITICAL Vulnerabilities (1)

        Uncontrolled Resource Consumption

        moment is a JavaScript date library for parsing, validating, manipulating, and formatting dates. Affected versions of moment were found to use an inefficient parsing algorithm. Specifically using string-to-date parsing in moment (more specifically rfc2822 parsing, which is tried by default) has quadratic (N^2) complexity on specific inputs. Users may notice a noticeable slowdown is observed with inputs above 10k characters. Users who pass user-provided strings without sanity length checks to moment constructor are vulnerable to (Re)DoS attacks. The problem is patched in 2.29.4, the patch can be applied to all affected versions with minimal tweaking. Users are advised to upgrade. Users unable to upgrade should consider limiting date lengths accepted from user input.

        CVSS Score: 7.5

        CVSS Vector: CVSS:3.1/AV:N/AC:L/PR:N/UI:N/S:U/C:N/I:N/A:H

        CWE: CWE-400

      SEVERE Vulnerabilities (9)
        sonatype-2021-0900

        Improper Neutralization of Input During Web Page Generation ('Cross-site Scripting')

        CVSS Score: 6.5

        CVSS Vector: CVSS:3.1/AV:N/AC:L/PR:N/UI:R/S:U/C:N/I:H/A:N

        CWE: CWE-79

        CVE-2018-14040

        Improper Neutralization of Input During Web Page Generation ('Cross-site Scripting')

        In Bootstrap before 4.1.2, XSS is possible in the collapse data-parent attribute.

        CVSS Score: 6.1

        CVSS Vector: CVSS:3.0/AV:N/AC:L/PR:N/UI:R/S:C/C:L/I:L/A:N

        CWE: CWE-79

        CVE-2018-14041

        Improper Neutralization of Input During Web Page Generation ('Cross-site Scripting')

        In Bootstrap before 4.1.2, XSS is possible in the data-target property of scrollspy.

        CVSS Score: 6.1

        CVSS Vector: CVSS:3.0/AV:N/AC:L/PR:N/UI:R/S:C/C:L/I:L/A:N

        CWE: CWE-79

        CVE-2018-14042

        Improper Neutralization of Input During Web Page Generation ('Cross-site Scripting')

        In Bootstrap before 4.1.2, XSS is possible in the data-container property of tooltip.

        CVSS Score: 6.1

        CVSS Vector: CVSS:3.0/AV:N/AC:L/PR:N/UI:R/S:C/C:L/I:L/A:N

        CWE: CWE-79

        CVE-2019-11358

        Improperly Controlled Modification of Object Prototype Attributes ('Prototype Pollution')

        jQuery before 3.4.0, as used in Drupal, Backdrop CMS, and other products, mishandles jQuery.extend(true, {}, ...) because of Object.prototype pollution. If an unsanitized source object contained an enumerable proto property, it could extend the native Object.prototype.

        CVSS Score: 6.1

        CVSS Vector: CVSS:3.1/AV:N/AC:L/PR:N/UI:R/S:C/C:L/I:L/A:N

        CWE: CWE-1321

        CVE-2019-8331

        Improper Neutralization of Input During Web Page Generation ('Cross-site Scripting')

        In Bootstrap before 3.4.1 and 4.3.x before 4.3.1, XSS is possible in the tooltip or popover data-template attribute.

        CVSS Score: 6.1

        CVSS Vector: CVSS:3.1/AV:N/AC:L/PR:N/UI:R/S:C/C:L/I:L/A:N

        CWE: CWE-79

        CVE-2020-11023

        Improper Neutralization of Input During Web Page Generation ('Cross-site Scripting')

        In jQuery versions greater than or equal to 1.0.3 and before 3.5.0, passing HTML containing <option> elements from untrusted sources - even after sanitizing it - to one of jQuery's DOM manipulation methods (i.e. .html(), .append(), and others) may execute untrusted code. This problem is patched in jQuery 3.5.0.

        CVSS Score: 6.1

        CVSS Vector: CVSS:3.1/AV:N/AC:L/PR:N/UI:R/S:C/C:L/I:L/A:N

        CWE: CWE-79

        sonatype-2018-0607

        Improper Neutralization of Input During Web Page Generation ('Cross-site Scripting')

        CVSS Score: 6.1

        CVSS Vector: CVSS:3.0/AV:N/AC:L/PR:N/UI:R/S:C/C:L/I:L/A:N

        CWE: CWE-79

        sonatype-2020-0187

        Improper Neutralization of Input During Web Page Generation ('Cross-site Scripting')

        CVSS Score: 6.1

        CVSS Vector: CVSS:3.1/AV:N/AC:L/PR:N/UI:R/S:C/C:L/I:L/A:N

        CWE: CWE-79

sonatype-lift[bot] avatar Aug 18 '22 12:08 sonatype-lift[bot]

One thing we haven't tested here is if cpus and/or memory are also closures. In that case, will they be evaluated before clusterOptions? Or will they be evaluated on-the-fly when the clusterOptions invokes task.cpus?

bentsherman avatar Nov 21 '22 17:11 bentsherman

@bentsherman How can I test this on a production environment? If you give me some hints about how to run this exact revision I can run some tests on our current environment and return the outputs here. I can try some closures.

Mxrcon avatar Nov 22 '22 15:11 Mxrcon

  1. clone the nextflow repo
  2. run make compile
  3. use ./launch.sh as a drop-in replacement for the nextflow binary, e.g. ./launch.sh run test.nf
  4. run a test pipeline with dynamic cpus and clusterOptions

something like:

cpus = { 2 * task.attempt }
clusterOptions = { "-l nodes=1:${params.architecture}:ppn=${task.cpus}" }

bentsherman avatar Nov 22 '22 17:11 bentsherman

Commands used:

git clone https://github.com/nextflow-io/nextflow
git fetch origin
git checkout -b ben/pbspro-cluster-options
cd nextflow 
make compile
cd ..
./nextflow/launch.sh main.nf -c nextflow.config 

qstat output:

Job ID                  Username    Queue    Jobname          SessID  NDS   TSK   Memory      Time    S   Time
----------------------- ----------- -------- ---------------- ------ ----- ------ --------- --------- - ---------

33664.hpcnod11          xxxxxxxxxx sabmi    nf-FOO            48529     1      2       4gb       --  R  00:00:00   hpcnod03/0-1


.command.run header:

#PBS -l nodes=1:ppn=2
#PBS -l mem=4gb
#PBS -l nodes=1:x86:ppn=2

Nextflow config:

process{
    executor= "pbs"
    queue= "sabmi"
    clusterOptions = {"-l nodes=1:${task.ext.arch}:ppn=${task.cpus}"}

   withName: "FOO" {
    ext.arch = "x86"    
    cpus = { 2 * task.attempt }
    memory = "4 GB"  
    }
 }

executor.queueSize = 15
executor.pollInterval = '5 sec'

The closure is working as intended but unfortunately we still have 2 headers specifying cpus on that.

Mxrcon avatar Nov 22 '22 18:11 Mxrcon

@Mxrcon , the problem is with the git commands

  • This command only switches to a new local branch (and does not pull the changes in the corresponding remote/branch)
$ git checkout -b ben/pbspro-cluster-options
Switched to a new branch 'ben/pbspro-cluster-options'

  • Use the git log --oneline to check history and it shows that only the commits from master have been pulled
git log --oneline 
811e7ca8e (HEAD -> master, origin/master, origin/HEAD, ben/pbspro-cluster-options) Add support for Java 19
0884e80e6 Add support for custom conda channels (#3435) [ci fast]
1ed2640a8 Add time directive to AWS Batch, clean language (#3436) [ci skip]
ab5bd81b9 Update err message [ci fast]
68b45c929 Fix Flux executor config (#3432) [ci fast]
...
...

Solution-1 (use gh CLI - if you can setup on the cluster)

The CLI would do the following

  • Set the upstream for the branch
  • create a local branch and switch to it
  • Sync with the remote branch
$ gh pr checkout 3015
Branch 'ben/pbspro-cluster-options' set up to track remote branch 'ben/pbspro-cluster-options' from 'origin'.
Switched to a new branch 'ben/pbspro-cluster-options'

Solution-2 (use vanilla git)

  • Set the upstream for the branch
$ git branch --set-upstream-to=origin/ben/pbspro-cluster-options ben/pbspro-cluster-options
  • Pull the changes from the upstream branch (this PR)
$ git pull --rebase
  • Confirm Ben's last commit
$ git log --oneline | grep 'cbe6ef354'                              
cbe6ef354 Update executor.rst [ci skip]

@Mxrcon , could you please try again when you're at IEC? Looking forward to your response.

abhi18av avatar Nov 28 '22 11:11 abhi18av

Thanks for the correction @abhi18av, It's always good to have some git-fu lessons.

Now I got it working.

  • Updated .command.run header:
#PBS -N nf-FOO
#PBS -o xxxxxxx/pbs_directive_work/work/fd/2bfb91328ae6a1adc9f517d647d902/.command.log
#PBS -j oe
#PBS -q sabmi
#PBS -l nodes=1:x86:ppn=2
#PBS -l mem=4gb

  • Qstat output:
Job ID                  Username    Queue    Jobname          SessID  NDS   TSK   Memory      Time    S   Time
----------------------- ----------- -------- ---------------- ------ ----- ------ --------- --------- - ---------

33771.hpcnod11          xxxxxxxxxxx sabmi    nf-FOO            17008     1      2       4gb       --  C       --    hpcnod16/0-1 

Thank you @bentsherman, This implementation seems to fix the issue even when using closures!!

Also, the warn is very nice: WARN: cpus directive is ignored when clusterOptions contains -l option tip: clusterOptions = { "-l nodes=1:ppn=${task.cpus}:..." }

Mxrcon avatar Nov 28 '22 14:11 Mxrcon

@abhi18av @bentsherman @pditommaso, I extended this test running nf-core mag on test config to seem if this could be used for already existing pipelines.

Command used: ./nextflow/launch.sh run nf-core/mag -profile test,docker --outdir mag_results -c nextflow.config

nextflow.config content:

process{
    executor= "pbs"
    queue= "sabmi"
    clusterOptions = {"-l nodes=1:hpcnod16:ppn=${task.cpus}"}
 }

executor.queueSize = 15
executor.pollInterval = '5 sec'

Output:

Completed at: 28-Nov-2022 12:55:46
Duration    : 10m 12s
CPU hours   : 1.3 (0% failed)
Succeeded   : 156
Ignored     : 1
Failed      : 1

Mxrcon avatar Nov 28 '22 15:11 Mxrcon

Hi team,

This thing came up in another context where another university has docker installed only on a single node and then its a hard requirement to rely upon -l cluster options to be able to send jobs to only that node.

Request for comments: Till the moment that this particular PR is merged, I'm wondering if there is a gotcha on using a custom build for nextflow using make pack?

1. Checkout this branch locally

gh pr checkout 3015
Branch 'ben/pbspro-cluster-options' set up to track remote branch 'ben/pbspro-cluster-options' from 'origin'.
Switched to a new branch 'ben/pbspro-cluster-options'

2. Clean up any previously built assets

$~/d/p/nextflow (ben/pbspro-cluster-options)> make clean
rm -rf .nextflow*
rm -rf work
rm -rf modules/nextflow/.nextflow*
rm -rf modules/nextflow/work
rm -rf build
rm -rf modules/*/build
rm -rf plugins/*/build
./gradlew clean
Starting a Gradle Daemon, 1 busy Daemon could not be reused, use --status for details

BUILD SUCCESSFUL in 10s
24 actionable tasks: 1 executed, 23 up-to-date

3. Rely upon the make pack utility which produces a bundled Nextflow version

$~/d/p/nextflow (ben/pbspro-cluster-options)> make pack
BUILD_PACK=1 ./gradlew packAll

> Task :nextflow:compileGroovy
Note: /home/abhinav/data/projects/nextflow/modules/nextflow/src/main/groovy/nextflow/sort/BigSort.java uses unchecked or unsafe operations.
Note: Recompile with -Xlint:unchecked for details.

> Task :plugins:nf-amazon:compileGroovy
Note: Some input files use or override a deprecated API.
Note: Recompile with -Xlint:deprecation for details.
Note: /home/abhinav/data/projects/nextflow/plugins/nf-amazon/src/main/com/upplication/s3fs/S3FileSystemProvider.java uses unchecked or unsafe operations.
Note: Recompile with -Xlint:unchecked for details.

> Task :packAll
+ Nextflow package `ALL` copied to: /home/abhinav/data/projects/nextflow/build/releases


Deprecated Gradle features were used in this build, making it incompatible with Gradle 8.0.

You can use '--warning-mode all' to show the individual deprecation warnings and determine if they come from your own scripts or plugins.

See https://docs.gradle.org/7.6/userguide/command_line_interface.html#sec:command_line_warnings

BUILD SUCCESSFUL in 45s
30 actionable tasks: 17 executed, 2 from cache, 11 up-to-date

4. Check that the bundled version is created

$~/d/p/nextflow (ben/pbspro-cluster-options)> ls /home/abhinav/data/projects/nextflow/build/releases
nextflow-22.12.0-edge-all*


$~/d/p/nextflow (ben/pbspro-cluster-options)> /home/abhinav/data/projects/nextflow/build/releases/nextflow-22.12.0-edge-all info
  Version: 22.12.0-edge build 5832
  Created: 16-12-2022 09:49 UTC (11:49 SAST)
  System: Linux 5.15.0-43-generic
  Runtime: Groovy 3.0.13 on OpenJDK 64-Bit Server VM 17.0.4+8-LTS
  Encoding: UTF-8 (UTF-8)

5. Rename that version to avoid mixing with the official release (nextflow-22.12.0-pbspatch-pr3015-edge-all )

$~/d/p/nextflow (ben/pbspro-cluster-options)> mv /home/abhinav/data/projects/nextflow/build/releases/nextflow-22.12.0-edge-all /home/abhinav/data/projects/nextflow/build/releases/nextflow-22.12.0-pbspatch-pr3015-edge-all

6. Launch the jobs using this custom build of Nextflow with the PBS patch.

abhi18av avatar Dec 21 '22 09:12 abhi18av

@pditommaso , could you kindly let us know if there's anything pending on this PR? ✅

abhi18av avatar Jan 18 '23 16:01 abhi18av

@abhi18av sorry for the late reply. @bentsherman I've made a few changes, please have a look

pditommaso avatar Feb 13 '23 23:02 pditommaso

Changes look good, I would say just revert the tokenize since the space after -l is not required.

bentsherman avatar Feb 14 '23 15:02 bentsherman

still not super happy with .indexOf('-l') because it would catch also any string like --some-option-like-this

pditommaso avatar Feb 14 '23 16:02 pditommaso

I agree. Maybe we could just keep the tokenize and see what happens? 😅

bentsherman avatar Feb 14 '23 16:02 bentsherman

Thanks for picking this back up Paolo and Ben 💚

I think that for the use-cases I have seen, the current state might suffice so perhaps we take this forward and we can circle back after taking the edge release for a spin with more nf-core and custom pipelines?

abhi18av avatar Feb 14 '23 17:02 abhi18av

Abhinav you know that we only deliver very high-quality software! 😆

I think this can be addressed easily using a simple regex to match -l.+ or -l.+ (or something like that)

https://github.com/nextflow-io/nextflow/blob/8f079a9568da039751007b3e034a0d032e566641/modules/nextflow/src/main/groovy/nextflow/executor/PbsProExecutor.groovy#L74-L74

pditommaso avatar Feb 14 '23 17:02 pditommaso

Done fbb4d9a2

pditommaso avatar Feb 14 '23 18:02 pditommaso

Looks good to merge 👍

bentsherman avatar Feb 14 '23 19:02 bentsherman

Abhinav you know that we only deliver very high-quality software! 😆

What happened to user-driven testing for edge/snapshot releases? 😉

======

Looking forward to the edge release - I know of atleast two institutions 🏫 unblocked by this!

CC @Mxrcon

abhi18av avatar Feb 15 '23 06:02 abhi18av

They will come!

pditommaso avatar Feb 16 '23 08:02 pditommaso

Included in version 23.02.0-edge

pditommaso avatar Feb 21 '23 19:02 pditommaso

Looking forward to the edge release - I know of atleast two institutions school unblocked by this!

Included in version 23.02.0-edge

Thank you @pditommaso, I'll test on our production cluster and take some nf-core pipelines for testing

Mxrcon avatar Feb 22 '23 12:02 Mxrcon