flowcraft icon indicating copy to clipboard operation
flowcraft copied to clipboard

Handle single fastq files as inputs

Open tiagofilipe12 opened this issue 6 years ago • 8 comments

Right now, assemblerflow accepts only paired end read files (fastq), however it would be handy to add support for single fastq files.

tiagofilipe12 avatar Apr 13 '18 13:04 tiagofilipe12

I agree that this will be a good thing to support. From the point of view of assemblerflow, the required modification is trivial. The fromFilePairs channel could be defined as:

Channel.fromFilePairs(params.reads, size: params.singleEnd ? 1 : 2, type: 'file')

To allow for both single and paired-end data. However, the majority of the modifications would be on the template scripts themselves, which are mostly designed for paired end data. Moreover, we would need to create a requirement that templates using fastq data would need to support from single and paired-end files, whenever possible.

When that isn't a possibility, that should be explicit in the documentation of the component using the template (and perhaps including a check in pipelines using those components that prevent the input of single end data.

ODiogoSilva avatar Apr 14 '18 17:04 ODiogoSilva

Also a statement like this can be used in each process template that requires to handle both paired end or single end inputs.

tiagofilipe12 avatar Apr 14 '18 19:04 tiagofilipe12

This will be highly dependent on how the software handles paired end and on whether you will be using subprocess in python or using bash directly.

On Sat, 14 Apr 2018, 20:05 Tiago Jesus, [email protected] wrote:

Also a statement like this https://github.com/ODiogoSilva/assemblerflow/blob/patlas/assemblerflow/generator/templates/mapping_patlas.nf#L21-L26 can be used in each process template that requires to handle both paired end or single end inputs.

— You are receiving this because you commented.

Reply to this email directly, view it on GitHub https://github.com/ODiogoSilva/assemblerflow/issues/62#issuecomment-381352322, or mute the thread https://github.com/notifications/unsubscribe-auth/ABdhhNpcVzuXG0Tm1mIveUSfQoK10s-fks5tokiLgaJpZM4TTe-A .

ODiogoSilva avatar Apr 14 '18 20:04 ODiogoSilva

Yes, my point is precisely that we can leave that handler to the process itself or to the python script as you say.

tiagofilipe12 avatar Apr 14 '18 21:04 tiagofilipe12

I've been receiving request to support single FastQ files as input. @tiagofilipe12 example is no longer available but I like @ODiogoSilva solution of having both possibilities in each template. A simple solution is to duplicate the scripts and adjust accordingly.

cimendes avatar Jun 22 '18 12:06 cimendes

Double the scripts, double the maintenance effort. It seems easier to have the FastQ channel accepting both paired and single reads and then add a condition in the templates depending on the number of fastq received.

ODiogoSilva avatar Jun 22 '18 12:06 ODiogoSilva

I agree. I was just thinking about tools where you have to input each fastq file in a different parameter or that have completely different parameters depending if you're working with paired-end or single fastq files.

cimendes avatar Jun 22 '18 13:06 cimendes

Does Flowcraft support interleaved paired-end FASTQ file, for example as output by samtools mergepe? That's my preferred format, by far, for paired-end reads. I opened issue https://github.com/assemblerflow/flowcraft/issues/136 for this feature request.

sjackman avatar Sep 30 '18 22:09 sjackman