sarek Work with split chromosome and cram

Description of the bug

We try to launch sarek but in a very special case:

mapping are store in cram
mapping are split by chromosome
mapping are store in s3 like system

We try to add a lane column in input file to have multiple input file for one sample.

But if column lane is present sarek ask for a bam and bai column and no cram and crai column or for step prepare_recalibration cram and crai column are allow.

Command used and terminal output

tmp_input.csv and tmp_input2.csv are present in attachment

$ nextflow run nf-core/sarek -profile conda --input tmp_input.csv --genome GATK.GRCh38 --outdir ./result --step prepare_recalibration # this run failled
$ nextflow run nf-core/sarek -profile conda --input tmp_input2.csv --genome GATK.GRCh38 --outdir ./result --step prepare_recalibration # this run work

Relevant files

sarek_issue.zip

System information

Linux 3.10
Conda 25.3.1
nextflow 25.04.3
executor local

Jun 03 '25 14:06 natir

Interesting. So you would like the ability to input split samples at a later step. We need to think about how we can support this without bloating the logic I think. Is there an option for you in the meantime to merge the cram files pre-run? What do you want to do post recalibration?

Jun 10 '25 09:06 FriederikeHanssen

In fact we want to run variants calling part of pipeline on hundred sample already align by bwa-mem2 and we want avoid realignment.

Obviously we can workaround this bug by perform merge of cram files before run pipeline.

Jun 11 '25 11:06 natir

What sort of variant calling do you want to do? I am asking because for small variants we are parallelising across the chromosomal regions, whereas for SVs we usually look at everything at once.

Jun 11 '25 13:06 FriederikeHanssen

We want perform all variant calling SV and SNV.

Jun 12 '25 08:06 natir