Work with split chromosome and cram
Description of the bug
We try to launch sarek but in a very special case:
- mapping are store in cram
- mapping are split by chromosome
- mapping are store in s3 like system
We try to add a lane column in input file to have multiple input file for one sample.
But if column lane is present sarek ask for a bam and bai column and no cram and crai column or for step prepare_recalibration cram and crai column are allow.
Command used and terminal output
tmp_input.csv and tmp_input2.csv are present in attachment
$ nextflow run nf-core/sarek -profile conda --input tmp_input.csv --genome GATK.GRCh38 --outdir ./result --step prepare_recalibration # this run failled
$ nextflow run nf-core/sarek -profile conda --input tmp_input2.csv --genome GATK.GRCh38 --outdir ./result --step prepare_recalibration # this run work
Relevant files
System information
- Linux 3.10
- Conda 25.3.1
- nextflow 25.04.3
- executor local
Interesting. So you would like the ability to input split samples at a later step. We need to think about how we can support this without bloating the logic I think. Is there an option for you in the meantime to merge the cram files pre-run? What do you want to do post recalibration?
In fact we want to run variants calling part of pipeline on hundred sample already align by bwa-mem2 and we want avoid realignment.
Obviously we can workaround this bug by perform merge of cram files before run pipeline.
What sort of variant calling do you want to do? I am asking because for small variants we are parallelising across the chromosomal regions, whereas for SVs we usually look at everything at once.
We want perform all variant calling SV and SNV.