sambamba
sambamba copied to clipboard
Sambamba slice - 94 Segmentation fault (core dumped)
Hello I am running sambamba v. 1.0.1 from a within a docker image on the Google Cloud within nextflow. The purpose is to slice the bam file (human genome) by chromosome. Therefore I am providing a bed file with the coordinates of a single chromosome at a time (as a channel) to nextflow in a process:
''' process sambamba_slice_bam { container 'gcr.io/diagnostics-uz/sambamba_v1.0.1@sha256:f6947d458d2a225580976b1ce8e238a07098073307700fd41bb0cda910956b28' label 'lotsOfWork' machineType 'e2-highmem-16' memory '16 GB' maxForks 8 disk { 20.GB + ( 3.B * bam.size() ) }
input: tuple val(sample_id), path(bam), path(bai) path chromosome_bed val num_threads
output: tuple val(sample_id), path("results/.bam"), path("results/.bai"), emit: indexed_sliced_bam
shell: mkdir -p result #get list of chromosomes to slice CHROMOSOMES_TO_SLICE=$(cat !{chromosome_bed} | while read chr start end; do echo "$chr";done | sort | uniq | xargs)
#perform slicing
SAMBAMBA_EXEC=/work/apps/sambamba/sambamba
for chrom in ${CHROMOSOMES_TO_SLICE}; do
echo -e "Working on chromosome ${chrom} ... \\n"
single_chrom_bed="!{sample_id}.${chrom}.sliced.bed"
echo -e "Constructing ${single_chrom_bed} to slice bam for ${chrom}... \\n"
OUTBAM=$(basename $single_chrom_bed .bed).bam
grep -P "^${chrom}\\s" "!{chromosome_bed}" > "${single_chrom_bed}"
#perform slicing
$SAMBAMBA_EXEC slice -o "results/${OUTBAM}" -L "${single_chrom_bed}" "!{bam}"
#index sliced BAM
$SAMBAMBA_EXEC index --nthreads="!{num_threads}" "results/${OUTBAM}"
done
echo -e "ALL DONE\\n"
} '''
I am getting the following error: ''' sambamba 1.0.1 by Artem Tarasov and Pjotr Prins (C) 2012-2023 LDC 1.32.0 / DMD v2.102.2 / LLVM14.0.6 / bootstrap LDC - the LLVM D compiler (1.32.0) /mnt/disks/gcap-nf-scratch/f1/c1747bb64e922dbfeabe384eee928d/.command.sh: line 9: 94 Segmentation fault (core dumped) ${SAMBAMBA_EXEC} slice -o "results/${OUTBAM}" -L "${single_chrom_bed}" "277469.recalibrated.sorted.bam" '''
Any idea what the problem is?
I've had a similar issue on my institution's cluster, there it was because the D language underlying Sambamba cannot handle some modern hardware. Something with D using ubyte to estimate CPU cache size which doesn't work on either the amount of CPUs or the type of CPUs, which causes a division by 0 down the line.
What solved it for us was to request older CPUs for the job.
I ended up using sambamba view
instead ..