flowcraft
flowcraft copied to clipboard
ERROR ~ No fastq files provided with pattern: 'pe_*.fastq.gz'
The command
nextflow run spades.nf --fastq 'pe_*.fastq.gz'
produces the error
ERROR ~ No fastq files provided with pattern:'pe_*.fastq.gz'
This command
nextflow run spades.nf --fastq "pe_{1,2}.fastq.gz"
however works as expected. That's not entirely intuitive to me. What's the syntax of the --fastq
parameter?
Hello! The --fastq parameter requires a path expression to paired-end fastq files so that the value returned is the pair of files for a sample. You can find more information on what raw inputs types are available in flowcraft's documentation: https://flowcraft.readthedocs.io/en/latest/user/pipeline_building.html#rawinput
I hope it helps. 😄
Just to follow up, the --fastq parameter syntax is a glob pattern that groups pairs of FastQ files in a single nextflow channel emission. It basically feeds into the fromFilePairs
channel (https://www.nextflow.io/docs/latest/channel.html?highlight=paired%20end#fromfilepairs)
The glob pattern sample_*.fq.gz
matches two FASTQ files.
❯❯❯ echo sample_*.fq.gz
sample_1.fq.gz sample_2.fq.gz
Using this pattern however gives the error ERROR ~ No fastq files provided with pattern:'sample_*.fq.gz'
nextflow run ./spades.nf --fastq='sample_*.fq.gz'
N E X T F L O W ~ version 0.32.0
Launching `./spades.nf` [berserk_pike] - revision: ca8798b3c5
============================================================
F L O W C R A F T
============================================================
Built using flowcraft v1.3.0
Input FastQ : 2
Input samples : 1
Reports are found in : ./reports
Results are found in : ./results
Profile : standard
Starting pipeline at Mon Oct 01 12:31:32 PDT 2018
ERROR ~ No fastq files provided with pattern:'sample_*.fq.gz'
-- Check '.nextflow.log' file for details
Is that the intended behaviour?
Yes, this is the expected behavior from nextflow's fromFilePairs
channel as @ODiogoSilva said. Because it uses the glob pattern {1,2} to match both pairs with each other. So for example usually I use two glob patterns depending on the input files:
-
*_{1,2}.*
-
*R{1,2}.*
At the command line both sample_*.fq.gz
and sample_{1,2}.fq.gz
expand to the same value.
❯❯❯ echo sample_*.fq.gz
sample_1.fq.gz sample_2.fq.gz
❯❯❯ echo sample_{1,2}.fq.gz
sample_1.fq.gz sample_2.fq.gz
Since both glob patterns expand to the same value, I would expect both to work. It's not clear from the documentation that the former works, and the latter does not. It would help to document explicitly that --fastq
must contain {1,2}
. A more helpful error message could be provided when --fastq
does not contain {1,2}
.
https://flowcraft.readthedocs.io/en/latest/user/pipeline_building.html#raw-input-types
I agree with updating the error message. However, as you stated, it should be fine both ways but it seems to be a limitation of the nextflow channel. In the nextflow documentation it seems possible to use the sample_*.fq.gz
glob and use a clojure to set the ID of the fastq, but they only provide an example that extract the extension as the ID. I've asked in the nextflow gitter channel for any other ways of fetching the correct ID, so that we can actually have both globs working.