modules icon indicating copy to clipboard operation
modules copied to clipboard

Picard/ADDORREPLACEREADGROUPS: making args conform to other packages; adding optional index output

Open cmatKhan opened this issue 2 years ago • 5 comments

  • modified picard_addorreplacedreadgroups to parse the ext.args parameter similarly to how other modules parse it

  • added an optional emit channel for the index, in the event the flag CREATE_INDEX is passed

  • updated the meta.yaml to reflect the additional optional emit channel, and added myself to the authors list

  • [ x] This comment contains a description of changes (with reason).

cmatKhan avatar Jul 27 '22 19:07 cmatKhan

OK -- my bad. I didn't update the test.yml. It is updated now...I think it will pass?

cmatKhan avatar Aug 01 '22 19:08 cmatKhan

Caused by:
  Process `test_picard_addorr--eplacereadgroups:PICARD_ADDORREPLACEREADGROUPS (test)` terminated with an error exit status (1)

Command executed:

  picard \
      AddOrReplaceReadGroups \
      -Xmx3g \
      --INPUT test.paired_end.bam \
      --OUTPUT test.bam \
  
  
  cat <<-END_VERSIONS > versions.yml
  "test_picard_addorreplacereadgroups:PICARD_ADDORREPLACEREADGROUPS":
      picard AddOrReplaceReadGroups: $(picard AddOrReplaceReadGroups --version 2>&1 | grep -o -E '[[:digit:]]+\.[[:digit:]]+\.[[:digit:]]+')
  END_VERSIONS

Command exit status:
  1

Command output:
  (empty)

Command error:
  
  --RGDT,-DT <Iso8601Date>      Read-Group run date  Default value: null. 
  
  --RGFO,-FO <String>           Read-Group flow order  Default value: null. 
  
  --RGID,-ID <String>           Read-Group ID  Default value: 1. 
  
  --RGKS,-KS <String>           Read-Group key sequence  Default value: null. 
  
  --RGPG,-PG <String>           Read-Group program group  Default value: null. 
  
  --RGPI,-PI <Integer>          Read-Group predicted insert size  Default value: null. 
  
  --RGPM,-PM <String>           Read-Group platform model  Default value: null. 
  
  --SORT_ORDER,-SO <SortOrder>  Optional sort order to output in. If not supplied OUTPUT is in the same order as INPUT. 
                                Default value: null. Possible values: {unsorted, queryname, coordinate, duplicate,
                                unknown} 
  
  --TMP_DIR <File>              One or more directories with space available to be used by this program for temporary
                                storage of working files  This argument may be specified 0 or more times. Default value:
                                null. 
  
  --USE_JDK_DEFLATER,-use_jdk_deflater <Boolean>
                                Use the JDK Deflater instead of the Intel Deflater for writing compressed output  Default
                                value: false. Possible values: {true, false} 
  
  --USE_JDK_INFLATER,-use_jdk_inflater <Boolean>
                                Use the JDK Inflater instead of the Intel Inflater for reading compressed input  Default
                                value: false. Possible values: {true, false} 
  
  --VALIDATION_STRINGENCY <ValidationStringency>
                                Validation stringency for all SAM files read by this program.  Setting stringency to
                                SILENT can improve performance when processing a BAM file in which variable-length data
                                (read, qualities, tags) do not otherwise need to be decoded.  Default value: STRICT.
                                Possible values: {STRICT, LENIENT, SILENT} 
  
  --VERBOSITY <LogLevel>        Control verbosity of logging.  Default value: INFO. Possible values: {ERROR, WARNING,
                                INFO, DEBUG} 
  
  --version <Boolean>           display the version number for this tool  Default value: false. Possible values: {true,
                                false} 
  
  
  Advanced Arguments:
  
  --showHidden <Boolean>        display hidden arguments  Default value: false. Possible values: {true, false} 
  
  
  Argument RGLB was missing: Argument 'RGLB' is required

Work dir:
  /home/runner/pytest_workflow_mtesp126/picard_addorreplacereadgroups_test_picard_addorreplacereadgroups/work/bb/9f1123eb57cd828bebec58e11649e5

Tip: you can try to figure out what's wrong by changing to the process work dir and showing the script file named `.command.sh`

OK, so now looking at the docs, some of the RG* are required, (but not all the ones in the original module).

I would argue (sorry to half change my mind), that as these are mandatory, they should go in as input channels. However I would equally make them 'required' in teh module - as in if not supplied in the input cahnnel the module should error.

What do you think @FriederikeHanssen ?

jfy133 avatar Aug 02 '22 06:08 jfy133

Caused by:
  Process `test_picard_addorr--eplacereadgroups:PICARD_ADDORREPLACEREADGROUPS (test)` terminated with an error exit status (1)

Command executed:

  picard \
      AddOrReplaceReadGroups \
      -Xmx3g \
      --INPUT test.paired_end.bam \
      --OUTPUT test.bam \
  
  
  cat <<-END_VERSIONS > versions.yml
  "test_picard_addorreplacereadgroups:PICARD_ADDORREPLACEREADGROUPS":
      picard AddOrReplaceReadGroups: $(picard AddOrReplaceReadGroups --version 2>&1 | grep -o -E '[[:digit:]]+\.[[:digit:]]+\.[[:digit:]]+')
  END_VERSIONS

Command exit status:
  1

Command output:
  (empty)

Command error:
  
  --RGDT,-DT <Iso8601Date>      Read-Group run date  Default value: null. 
  
  --RGFO,-FO <String>           Read-Group flow order  Default value: null. 
  
  --RGID,-ID <String>           Read-Group ID  Default value: 1. 
  
  --RGKS,-KS <String>           Read-Group key sequence  Default value: null. 
  
  --RGPG,-PG <String>           Read-Group program group  Default value: null. 
  
  --RGPI,-PI <Integer>          Read-Group predicted insert size  Default value: null. 
  
  --RGPM,-PM <String>           Read-Group platform model  Default value: null. 
  
  --SORT_ORDER,-SO <SortOrder>  Optional sort order to output in. If not supplied OUTPUT is in the same order as INPUT. 
                                Default value: null. Possible values: {unsorted, queryname, coordinate, duplicate,
                                unknown} 
  
  --TMP_DIR <File>              One or more directories with space available to be used by this program for temporary
                                storage of working files  This argument may be specified 0 or more times. Default value:
                                null. 
  
  --USE_JDK_DEFLATER,-use_jdk_deflater <Boolean>
                                Use the JDK Deflater instead of the Intel Deflater for writing compressed output  Default
                                value: false. Possible values: {true, false} 
  
  --USE_JDK_INFLATER,-use_jdk_inflater <Boolean>
                                Use the JDK Inflater instead of the Intel Inflater for reading compressed input  Default
                                value: false. Possible values: {true, false} 
  
  --VALIDATION_STRINGENCY <ValidationStringency>
                                Validation stringency for all SAM files read by this program.  Setting stringency to
                                SILENT can improve performance when processing a BAM file in which variable-length data
                                (read, qualities, tags) do not otherwise need to be decoded.  Default value: STRICT.
                                Possible values: {STRICT, LENIENT, SILENT} 
  
  --VERBOSITY <LogLevel>        Control verbosity of logging.  Default value: INFO. Possible values: {ERROR, WARNING,
                                INFO, DEBUG} 
  
  --version <Boolean>           display the version number for this tool  Default value: false. Possible values: {true,
                                false} 
  
  
  Advanced Arguments:
  
  --showHidden <Boolean>        display hidden arguments  Default value: false. Possible values: {true, false} 
  
  
  Argument RGLB was missing: Argument 'RGLB' is required

Work dir:
  /home/runner/pytest_workflow_mtesp126/picard_addorreplacereadgroups_test_picard_addorreplacereadgroups/work/bb/9f1123eb57cd828bebec58e11649e5

Tip: you can try to figure out what's wrong by changing to the process work dir and showing the script file named `.command.sh`

OK, so now looking at the docs, some of the RG* are required, (but not all the ones in the original module).

I would argue (sorry to half change my mind), that as these are mandatory, they should go in as input channels. However I would equally make them 'required' in teh module - as in if not supplied in the input cahnnel the module should error.

What do you think @FriederikeHanssen ?

If they are not files I wouldn't pass them as channels but strictly stay with passing them as args. This what I have normally seen everywhere. I also think it is a lot cleaner to have all args in one place in the modules.config later on a pipeline level. Makes it easier to see IMO what you actually plan on passing into the process.

FriederikeHanssen avatar Aug 02 '22 12:08 FriederikeHanssen

Ok In this case @cmatKhan you will need to add the RG IDs to the nexyfkow.config file in the test directory of the module

jfy133 avatar Aug 02 '22 13:08 jfy133

ugh -- of course. I had this in a config outside of the test module:

    withName: 'PICARD_ADDORREPLACEREADGROUPS'{
        ext.args = {[
            "-LB ${meta.id}",
            "-PL ILLUMINA",
            "-PU bc1",
            "-SM ${meta.id}"
        ].join(' ').trim()}
    }

which isn't terribly helpful. It is added to the test nextflow.config now.

cmatKhan avatar Aug 03 '22 13:08 cmatKhan

CI Test failure

Aug-30 06:16:07.285 [Task monitor] ERROR nextflow.processor.TaskProcessor - Error executing process > 'test_picard_addorreplacereadgroups:PICARD_ADDORREPLACEREADGROUPS (test)'

Caused by:
  Process `test_picard_addorreplacereadgroups:PICARD_ADDORREPLACEREADGROUPS (test)` terminated with an error exit status (1)

Command executed:

  picard \
      -Xmx3g \
      AddOrReplaceReadGroups \
      -LB test -PL ILLUMINA -PU bc1 -SM test \
      --INPUT test.paired_end.bam \
      --OUTPUT test.bam \
      -LB test -PL ILLUMINA -PU bc1 -SM test
  
  cat <<-END_VERSIONS > versions.yml
  "test_picard_addorreplacereadgroups:PICARD_ADDORREPLACEREADGROUPS":
      picard: $(picard AddOrReplaceReadGroups --version 2>&1 | grep -o -E '[[:digit:]]+\.[[:digit:]]+\.[[:digit:]]+')
  END_VERSIONS

Command exit status:
  1

Command output:
  (empty)

Command error:
  
  --RGDT,-DT <Iso8601Date>      Read-Group run date  Default value: null. 
  
  --RGFO,-FO <String>           Read-Group flow order  Default value: null. 
  
  --RGID,-ID <String>           Read-Group ID  Default value: 1. 
  
  --RGKS,-KS <String>           Read-Group key sequence  Default value: null. 
  
  --RGPG,-PG <String>           Read-Group program group  Default value: null. 
  
  --RGPI,-PI <Integer>          Read-Group predicted insert size  Default value: null. 
  
  --RGPM,-PM <String>           Read-Group platform model  Default value: null. 
  
  --SORT_ORDER,-SO <SortOrder>  Optional sort order to output in. If not supplied OUTPUT is in the same order as INPUT. 
                                Default value: null. Possible values: {unsorted, queryname, coordinate, duplicate,
                                unknown} 
  
  --TMP_DIR <File>              One or more directories with space available to be used by this program for temporary
                                storage of working files  This argument may be specified 0 or more times. Default value:
                                null. 
  
  --USE_JDK_DEFLATER,-use_jdk_deflater <Boolean>
                                Use the JDK Deflater instead of the Intel Deflater for writing compressed output  Default
                                value: false. Possible values: {true, false} 
  
  --USE_JDK_INFLATER,-use_jdk_inflater <Boolean>
                                Use the JDK Inflater instead of the Intel Inflater for reading compressed input  Default
                                value: false. Possible values: {true, false} 
  
  --VALIDATION_STRINGENCY <ValidationStringency>
                                Validation stringency for all SAM files read by this program.  Setting stringency to
                                SILENT can improve performance when processing a BAM file in which variable-length data
                                (read, qualities, tags) do not otherwise need to be decoded.  Default value: STRICT.
                                Possible values: {STRICT, LENIENT, SILENT} 
  
  --VERBOSITY <LogLevel>        Control verbosity of logging.  Default value: INFO. Possible values: {ERROR, WARNING,
                                INFO, DEBUG} 
  
  --version <Boolean>           display the version number for this tool  Default value: false. Possible values: {true,
                                false} 
  
  
  Advanced Arguments:
  
  --showHidden <Boolean>        display hidden arguments  Default value: false. Possible values: {true, false} 
  
  
  Illegal argument value: Argument 'SM/RGSM' cannot be specified more than once.

Work dir:
  /home/runner/pytest_workflow_6kbqt95g/picard_addorreplacereadgroups_test_picard_addorreplacereadgroups/work/b2/a6da114b235a38d9af3585b9ea1140

jfy133 avatar Aug 30 '22 06:08 jfy133

@FriederikeHanssen . I've merged the latest master and fixed the CI failures. I think all of your comments have now been addressed ?

muffato avatar Oct 03 '22 23:10 muffato

Yes looks good. However let's NOT merge until this is in: https://github.com/nf-core/modules/pull/2141/files (Major restructure of nf-core/modules to allow subworkflows in the future)

FriederikeHanssen avatar Oct 04 '22 07:10 FriederikeHanssen

Oh, I had missed the announcement on Slack, and was still trying to get all the existing PRs in. Of course, yes, let's wait 👍🏼

muffato avatar Oct 04 '22 12:10 muffato

@FriederikeHanssen : should be good for review, now !

muffato avatar Oct 30 '22 19:10 muffato