modules icon indicating copy to clipboard operation
modules copied to clipboard

enhancement: add parameter for fastp to make writing of trimmed fastqs optional

Open anoronh4 opened this issue 3 years ago • 4 comments

Closes #2700. with this PR, by defining the input channel save_trimmed_pass a user can define whether trimmed fastqs are written/saved, or not. This works for all three modes: interleaved, single_end, paired_end. This will help to save on storage when trimming is not necessarily desired but the qc provided by fastp is.

PR checklist

  • [x] This comment contains a description of changes (with reason).
  • [x] If you've fixed a bug or added code that should be tested, add tests!
  • [x] If you've added a new tool - have you followed the module conventions in the contribution docs
  • [x] If necessary, include test data in your PR.
  • [x] Remove all TODO statements.
  • [x] Emit the versions.yml file.
  • [x] Follow the naming conventions.
  • [x] Follow the parameters requirements.
  • [x] Follow the input/output options guidelines.
  • [x] Add a resource label
  • [x] Use BioConda and BioContainers if possible to fulfil software requirements.
  • Ensure that the test works with either Docker / Singularity. Conda CI tests can be quite flaky:
    • [x] PROFILE=docker pytest --tag <MODULE> --symlink --keep-workflow-wd --git-aware
    • [x] PROFILE=singularity pytest --tag <MODULE> --symlink --keep-workflow-wd --git-aware
    • [x] PROFILE=conda pytest --tag <MODULE> --symlink --keep-workflow-wd --git-aware

anoronh4 avatar Dec 28 '22 21:12 anoronh4

@maxulysse just wondering if there's any update for moving forward with this PR. I have addressed or responded to each of your review comments, is there anything else i need to do for this PR to be ready?

anoronh4 avatar Feb 08 '23 17:02 anoronh4

sorry, maybe I am not fully understanding the issue. so in the case you are describing no output will be written, just the json and log files?

FriederikeHanssen avatar Feb 14 '23 19:02 FriederikeHanssen

@FriederikeHanssen sorry i did not see your reply until now. yes, json/html and log files for QC purposes. this would be for those who want the QC of fastp but who want to avoid saving large intermediate files because they want to skip actually trimming their reads. this is a mode that fastp supports and mentions in their documentation: https://github.com/OpenGene/fastp#input-and-output

anoronh4 avatar Mar 15 '23 15:03 anoronh4

@Emiller88 so do we want to merge, and rewrite all tests in nf-test?

maxulysse avatar Oct 17 '23 13:10 maxulysse

This seems very broken with a lot merge conflict markers... It would need cleaning up ...

jfy133 avatar Jun 07 '24 03:06 jfy133

Can I get some eyes on this now. I rewrote the tests, as I think they had been written at the start of our nf-test use. I would like to test that the optional channels are indeed empty, but couldn't seem to manage that correctly at the moment. I don't think we need to check the contents of the other fastq files, and they are currently not working for me (commented out lines in the tests)

SPPearce avatar Jun 08 '24 08:06 SPPearce

I guess this is ok but I find that default behaviour is sub optimal.

The primary purpose of fastp is to do the read clean up, so by default I would expect to get those resulting reads.

I think just generating the QC and not using the FASTQ files is a rather odd edge case (imo) and would rather have that behaviour opt in....

jfy133 avatar Jun 08 '24 17:06 jfy133

I guess this is ok but I find that default behaviour is sub optimal.

The primary purpose of fastp is to do the read clean up, so by default I would expect to get those resulting reads.

I think just generating the QC and not using the FASTQ files is a rather odd edge case (imo) and would rather have that behaviour opt in....

I guess you mean that if you leave the channel empty then it will default to false? But the developer has to look at what the channels are to be able to plug the module in.

I'm running the demultiplex pipeline at the moment, and would like to get the QC report from fastp without the actual read trimming, because I'm going to pass them into a proprietary pipeline that is going to trim it anyway. Fastq

SPPearce avatar Jun 08 '24 20:06 SPPearce

I guess this is ok but I find that default behaviour is sub optimal.

The primary purpose of fastp is to do the read clean up, so by default I would expect to get those resulting reads.

I think just generating the QC and not using the FASTQ files is a rather odd edge case (imo) and would rather have that behaviour opt in....

I guess you mean that if you leave the channel empty then it will default to false? But the developer has to look at what the channels are to be able to plug the module in.

I'm running the demultiplex pipeline at the moment, and would like to get the QC report from fastp without the actual read trimming, because I'm going to pass them into a proprietary pipeline that is going to trim it anyway. Fastq

Actually no, more switch it to more like dontsave_trimmed. Agree empty channel is not a nice system, better to be an 'active' choice

Yeah I understand the justification, but I maintain most people would not be using a proprietary or alternative trimmer. The core Fastp purpose is to do the read clean up, it's not just a QC reporter tool like fastqc

jfy133 avatar Jun 09 '24 06:06 jfy133

I would go for a negative argument (skip or dontsave) if it wasn't for the fact the other two channels are positive (save). I don't see a large burden here on the developer, if they are passing reads in with four other empty channnels without looking at the module code/meta then they'll quickly discover it isn't making any reads.

SPPearce avatar Jun 09 '24 18:06 SPPearce

Hmmm... I dunno. I'm still not convinced. Maybe you can ask for a vote on #team-maintainers

jfy133 avatar Jun 10 '24 09:06 jfy133

What about changing the parameter names to write_ instead of save_?

SPPearce avatar Jun 10 '24 11:06 SPPearce

What about discard_fastq?

mahesh-panchal avatar Jun 10 '24 12:06 mahesh-panchal

Following discussion we are going to change from save_trimmed_pass to discard_trimmed_pass. Then running with false, false, false will provide expected behaviour.

SPPearce avatar Jun 10 '24 15:06 SPPearce