flowcraft icon indicating copy to clipboard operation
flowcraft copied to clipboard

ERROR ~ No fastq files provided with pattern: 'pe_*.fastq.gz'

Open sjackman opened this issue 5 years ago • 6 comments

The command

nextflow run spades.nf --fastq 'pe_*.fastq.gz'

produces the error

ERROR ~ No fastq files provided with pattern:'pe_*.fastq.gz'

This command

nextflow run spades.nf --fastq "pe_{1,2}.fastq.gz"

however works as expected. That's not entirely intuitive to me. What's the syntax of the --fastq parameter?

sjackman avatar Oct 01 '18 00:10 sjackman

Hello! The --fastq parameter requires a path expression to paired-end fastq files so that the value returned is the pair of files for a sample. You can find more information on what raw inputs types are available in flowcraft's documentation: https://flowcraft.readthedocs.io/en/latest/user/pipeline_building.html#rawinput

I hope it helps. 😄

cimendes avatar Oct 01 '18 12:10 cimendes

Just to follow up, the --fastq parameter syntax is a glob pattern that groups pairs of FastQ files in a single nextflow channel emission. It basically feeds into the fromFilePairs channel (https://www.nextflow.io/docs/latest/channel.html?highlight=paired%20end#fromfilepairs)

ODiogoSilva avatar Oct 01 '18 18:10 ODiogoSilva

The glob pattern sample_*.fq.gz matches two FASTQ files.

❯❯❯ echo sample_*.fq.gz
sample_1.fq.gz sample_2.fq.gz

Using this pattern however gives the error ERROR ~ No fastq files provided with pattern:'sample_*.fq.gz'

nextflow run ./spades.nf --fastq='sample_*.fq.gz'
N E X T F L O W  ~  version 0.32.0
Launching `./spades.nf` [berserk_pike] - revision: ca8798b3c5

============================================================
                F L O W C R A F T
============================================================
Built using flowcraft v1.3.0

 Input FastQ                 : 2
 Input samples               : 1
 Reports are found in        : ./reports
 Results are found in        : ./results
 Profile                     : standard

Starting pipeline at Mon Oct 01 12:31:32 PDT 2018

ERROR ~ No fastq files provided with pattern:'sample_*.fq.gz'

 -- Check '.nextflow.log' file for details

Is that the intended behaviour?

sjackman avatar Oct 01 '18 19:10 sjackman

Yes, this is the expected behavior from nextflow's fromFilePairs channel as @ODiogoSilva said. Because it uses the glob pattern {1,2} to match both pairs with each other. So for example usually I use two glob patterns depending on the input files:

  • *_{1,2}.*
  • *R{1,2}.*

tiagofilipe12 avatar Oct 02 '18 05:10 tiagofilipe12

At the command line both sample_*.fq.gz and sample_{1,2}.fq.gz expand to the same value.

❯❯❯ echo sample_*.fq.gz
sample_1.fq.gz sample_2.fq.gz
❯❯❯ echo sample_{1,2}.fq.gz
sample_1.fq.gz sample_2.fq.gz

Since both glob patterns expand to the same value, I would expect both to work. It's not clear from the documentation that the former works, and the latter does not. It would help to document explicitly that --fastq must contain {1,2}. A more helpful error message could be provided when --fastq does not contain {1,2}. https://flowcraft.readthedocs.io/en/latest/user/pipeline_building.html#raw-input-types

sjackman avatar Oct 02 '18 17:10 sjackman

I agree with updating the error message. However, as you stated, it should be fine both ways but it seems to be a limitation of the nextflow channel. In the nextflow documentation it seems possible to use the sample_*.fq.gz glob and use a clojure to set the ID of the fastq, but they only provide an example that extract the extension as the ID. I've asked in the nextflow gitter channel for any other ways of fetching the correct ID, so that we can actually have both globs working.

ODiogoSilva avatar Oct 02 '18 18:10 ODiogoSilva