modules icon indicating copy to clipboard operation
modules copied to clipboard

fix(bcl_demultiplex): Extract InterOp files from input channel

Open robsyme opened this issue 4 weeks ago • 4 comments

Extract InterOp files directly from the input channel rather than emitting them from BCLCONVERT/BCL2FASTQ process outputs. This avoids passing files through a process unchanged, allowing downstream consumers to access InterOp data immediately without waiting for demultiplexing to complete.

This also resolves an issue where Fusion could not collect InterOp files nested within staged input directories (nextflow-io/nextflow#5948).

The {,**/} glob pattern is used instead of **/ because Java's glob requires **/ to match at least one directory, but we need to match InterOp directories at both the root level and in subdirectories.

Changes:

  • Remove interop output from BCLCONVERT and BCL2FASTQ modules
  • Remove redundant cp commands that copied InterOp files
  • Extract InterOp files from input channel in BCL_DEMULTIPLEX subworkflow

robsyme avatar Dec 11 '25 19:12 robsyme

Thanks for the sanity check, Jon. I'll wait to get a review from @SPPearce before merging.

robsyme avatar Dec 12 '25 13:12 robsyme

Very fair point, Simon.

What if we pass the untarred directory to the basecalling processes to save them having to untar the files? The advantages are:

  1. The untar can happen on a smaller instance/job
  2. We avoid the copy step inside the base callers, freeing up the larger instance earlier
  3. Provides a good model for the nf-core community on how to avoid passing inputs through processes.

robsyme avatar Dec 12 '25 15:12 robsyme

I don't actually know how often a tarred file is used in the wild, whether it is only used for testing purposes because we can't stage a folder ;)

SPPearce avatar Dec 12 '25 16:12 SPPearce

That makes sense, good to know. Still - important to handle both cases (even if only for testing). If the the input is a plain 'ol directory (the most common expectation), we pull out the interop files directly and also pass through the directory into ch_flowcells by mixing it back in:

ch_flowcells = ch_flowcells_tar.samplesheets
    .join(UNTAR.out.untar)
    .mix(ch_flowcells_branched.dir) // <- the unchanged non-tarred directories.

robsyme avatar Dec 12 '25 16:12 robsyme