sarek icon indicating copy to clipboard operation
sarek copied to clipboard

Light refactoring

Open maxulysse opened this issue 1 year ago • 5 comments

The linter is failing, but that is a bug in the linter which has been solved in the dev-version of the tool.

PR checklist

  • [ ] This comment contains a description of changes (with reason).
  • [ ] If you've fixed a bug or added code that should be tested, add tests!
  • [ ] If you've added a new tool - have you followed the pipeline conventions in the contribution docs
  • [ ] If necessary, also make a PR on the nf-core/sarek branch on the nf-core/test-datasets repository.
  • [ ] Make sure your code lints (nf-core lint).
  • [ ] Ensure the test suite passes (nextflow run . -profile test,docker --outdir <OUTDIR>).
  • [ ] Usage Documentation in docs/usage.md is updated.
  • [ ] Output Documentation in docs/output.md is updated.
  • [ ] CHANGELOG.md is updated.
  • [ ] README.md is updated (including new tool citations and authors/contributors).

maxulysse avatar Feb 23 '24 11:02 maxulysse

nf-core lint overall result: Passed :white_check_mark: :warning:

Posted for pipeline commit 100f9ea

+| ✅ 181 tests passed       |+
#| ❔  14 tests were ignored |#
!| ❗   3 tests had warnings |!

:heavy_exclamation_mark: Test warnings:

  • pipeline_todos - TODO string in main.nf: Optionally add in-text citation tools to this list.
  • pipeline_todos - TODO string in main.nf: Optionally add bibliographic entries to this list.
  • pipeline_todos - TODO string in main.nf: Only uncomment below if logic in toolCitationText/toolBibliographyText has been filled!

:grey_question: Tests ignored:

  • files_exist - File is ignored: .github/workflows/awsfulltest.yml
  • files_exist - File is ignored: .github/workflows/awstest.yml
  • files_exist - File is ignored: conf/modules.config
  • files_exist - File is ignored: lib/WorkflowMain.groovy
  • files_exist - File is ignored: lib/NfcoreTemplate.groovy
  • files_exist - File is ignored: lib/WorkflowSarek.groovy
  • files_unchanged - File ignored due to lint config: .github/PULL_REQUEST_TEMPLATE.md
  • files_unchanged - File ignored due to lint config: assets/nf-core-sarek_logo_light.png
  • files_unchanged - File ignored due to lint config: docs/images/nf-core-sarek_logo_light.png
  • files_unchanged - File ignored due to lint config: docs/images/nf-core-sarek_logo_dark.png
  • files_unchanged - File ignored due to lint config: .gitignore or .prettierignore or pyproject.toml
  • actions_ci - actions_ci
  • actions_awstest - 'awstest.yml' workflow not found: /home/runner/work/sarek/sarek/.github/workflows/awstest.yml
  • template_strings - template_strings

:white_check_mark: Tests passed:

Run details

  • nf-core/tools version 2.13.1
  • Run at 2024-04-03 17:05:37

github-actions[bot] avatar Feb 23 '24 11:02 github-actions[bot]

So far this PR seems like it is adding some stuff and not refactoring 😄 Anyways, linter says

"Param modules_testdata_base_path from nextflow config not found in nextflow_schema.json"

asp8200 avatar Feb 23 '24 11:02 asp8200

sorry, this is an ongoing PR, I should have put it as a draft

maxulysse avatar Feb 23 '24 11:02 maxulysse

sorry still need to fix tumor_normal_pair test

maxulysse avatar Mar 11 '24 15:03 maxulysse

There's a nf-core linting failure that needs to be addressed before review:

nextflow_config - Config default value incorrect: params.aligner is set as bwa-mem in nextflow_schema.json but is null in nextflow.config.

kenibrewer avatar Mar 12 '24 13:03 kenibrewer

In 4575cba, then full-test is still not running on awsbatch :-/ I still get:

ERROR ~ Error executing process > 'NFCORE_SAREK:PREPARE_INTERVALS:CREATE_INTERVALS_BED (wgs_calling_regions.hg38.bed)'

Caused by:
  The user value contains invalid characters. Enter a value that matches the pattern ^([a-z0-9_][a-z0-9_-]{0,30})$

Command executed:

  awk -vFS="    " '{
      t = $5  # runtime estimate
      if (t == "") {
          # no runtime estimate in this row, assume default value
          t = ($3 - $2) / 200000
      }
      if (name == "" || (chunk > 600 && (chunk + t) > longest * 1.05)) {
          # start a new chunk
          name = sprintf("%s_%d-%d.bed", $1, $2+1, $3)
          chunk = 0
          longest = 0
      }
      if (t > longest)
          longest = t
      chunk += t
      print $0 > name
  }' wgs_calling_regions.hg38.bed

  cat <<-END_VERSIONS > versions.yml
  "NFCORE_SAREK:PREPARE_INTERVALS:CREATE_INTERVALS_BED":
      gawk: $(awk -Wversion | sed '1!d; s/.*Awk //; s/,.*//')
  END_VERSIONS

Command exit status:
  -

Command output:
  (empty)

(The full-tests also fails no the dev-branch, but due to other bugs which have now been fixed in this PR.)

The "small" test-profile also fail on 4575cba with a similar error but from another process:

ERROR ~ Error executing process > 'NFCORE_SAREK:PREPARE_INTERVALS:TABIX_BGZIPTABIX_INTERVAL_COMBINED (wgs_calling_regions.hg38)'

Caused by:
  The user value contains invalid characters. Enter a value that matches the pattern ^([a-z0-9_][a-z0-9_-]{0,30})$

Command executed:

  bgzip  --threads 1 -c  wgs_calling_regions.hg38.bed > wgs_calling_regions.hg38.bed.gz
  tabix  wgs_calling_regions.hg38.bed.gz

  cat <<-END_VERSIONS > versions.yml
  "NFCORE_SAREK:PREPARE_INTERVALS:TABIX_BGZIPTABIX_INTERVAL_COMBINED":
      tabix: $(echo $(tabix -h 2>&1) | sed 's/^.*Version: //; s/ .*$//')
  END_VERSIONS

Command exit status:
  -

Command output:
  (empty)

Seemingly similar issue reported here, but the fix proposed by @bentsherman didn't solve it.

clue: The "small" test-profile can run with awsbatch on the dev-branch (at least in one mamba-env), which seem to indicate that it is something in this PR that is causing the problem with awsbatch.

asp8200 avatar Mar 21 '24 08:03 asp8200

@asp8200 Could the problem maybe be happening when we try to assign a tag to the job? Maybe AWS batch only supports tags of a certain length, or the files for the AWS Batch dataset are too long / contain illegal characters.

kenibrewer avatar Mar 21 '24 13:03 kenibrewer

@maxulysse What still remains to be done on this PR? This is a pretty substantial bit of refactoring, and I think it would be good to get this merged back into dev sooner rather than later.

Also, as general practice for nf-core development, is it acceptable for major rewrites like this one to merge a mostly correct set of changes into dev and then address the remaining failures as PRs to that branch?

kenibrewer avatar Mar 27 '24 19:03 kenibrewer

@maxulysse What still remains to be done on this PR? This is a pretty substantial bit of refactoring, and I think it would be good to get this merged back into dev sooner rather than later.

Also, as general practice for nf-core development, is it acceptable for major rewrites like this one to merge a mostly correct set of changes into dev and then address the remaining failures as PRs to that branch?

I agree with you @kenibrewer, We got a bit sidetracked, especially me, due to hackathon + team retreat. Let's merge that ASAP and release when we can.

maxulysse avatar Apr 03 '24 16:04 maxulysse

LGTM. However, we still need to fix the runOptions/containerOptions-issue so that the pipeline can run over awsbatch. We do have a fix for that but it isn't great:

https://nextflow.slack.com/archives/C02T98A23U7/p1711040612294759?thread_ts=1711013350.318419&cid=C02T98A23U7

https://github.com/nf-core/sarek/pull/1430/files

yeah, but that's another issue

maxulysse avatar Apr 04 '24 07:04 maxulysse