speedseq
speedseq copied to clipboard
SVTyper fails to genotype with hg38 alignments. --split_bam (-S) is deprecated
I am using Version: 0.1.2
of speedseq
and version: v0.1.4
of svtyper
I have 3 bam files aligned with bwa mem
to the hg38 reference. This is my speedseq command
speedseq sv \
-B NA19238.alt_bwamem_GRCh38DH.20150715.YRI.high_coverage_markedDupsRemoved.bam,NA19239.alt_bwamem_GRCh38DH.20150715.YRI.high_coverage_markedDupsRemoved.bam,NA19240.alt_bwamem_GRCh38DH.20150715.YRI.high_coverage_markedDupsRemoved.bam \
-D NA19238.alt_bwamem_GRCh38DH.20150715.YRI.high_coverage_markedDupsRemoved_disc.bam,NA19239.alt_bwamem_GRCh38DH.20150715.YRI.high_coverage_markedDupsRemoved_disc.bam,NA19240.alt_bwamem_GRCh38DH.20150715.YRI.high_coverage_markedDupsRemoved_disc.bam \
-S NA19238.alt_bwamem_GRCh38DH.20150715.YRI.high_coverage_markedDupsRemoved_splt.bam,NA19239.alt_bwamem_GRCh38DH.20150715.YRI.high_coverage_markedDupsRemoved_splt.bam,NA19240.alt_bwamem_GRCh38DH.20150715.YRI.high_coverage_markedDupsRemoved_splt.bam \
-R GRCh38_full_analysis_set_plus_decoy_hla.fa \
-o hg38_yri \
-x GRCh38-centromere-gaps-segdups-combined.bed \
-g \
-t 8
I generated the discordant and split read files with the following commands
samtools view -bh -@ 8 -F 1294 $BAM | samtools sort -@ 8 -o $DISC_BAM
samtools view -@ 8 -h $BAM | /home/usr/bin/speedseq/src/lumpy-sv/scripts/extractSplitReads_BwaMem -i stdin | samtools view -@ 8 -b | samtools sort -@ 8 -o $SPLIT_BAM
I've ran the speedseq command twice, each using a different exclusion file. The exclusion file above is hg38 centromeres, assembly gaps, and segmental duplications.
The second exclusion file was a hg38 lift-over from the hg19 lumpy exclusion file packaged in speedseq
.
I get the same error for both jobs.
Here is the error message:
Warning: --split_bam (-S) is deprecated. Ignoring NA19238.alt_bwamem_GRCh38DH.20150715.YRI.high_coverage_markedDupsRemoved_splt.bam.
Calculating library metrics from NA19238.alt_bwamem_GRCh38DH.20150715.YRI.high_coverage_markedDupsRemoved.bam... done
slurmstepd: *** JOB 11382902 ON XXXX CANCELLED AT 2017-10-01T11:44:24 DUE TO TIME LIMIT ***
Note that the wall time limit was 48 hours.
Here's the STDOUT
LUMPY Express done
# genotype structural variants
python2.7 /home/usr/bin/speedseq/bin/svtyper -q -i hg38_yri.wAFJ3E8dZT5V/hg38_yri.sv.vcf -B NA19238.alt_bwamem_GRCh38DH.20150715.YRI.high_coverage_markedDupsRemoved.bam -S NA19238.alt_bwamem_GRCh38DH.20150715.YRI.high_coverage_markedDupsRemoved_splt.bam > hg38_yri.wAFJ3E8dZT5V/hg38_yri.sv.gt.vcf ; mv hg38_yri.wAFJ3E8dZT5V/hg38_yri.sv.gt.vcf hg38_yri.wAFJ3E8dZT5V/hg38_yri.sv.vcf
Here's an entry from SVTyper output
chr1 20712560 105 N <DEL> 9.08 . SVTYPE=DEL;SVLEN=-179;END=20712739;STRANDS=+-:5;CIPOS=-10,9;CIEND=-10,9;CIPOS95=0,0;CIEND95=0,0;SU=5;PE=0;SR=5 GT:SU:PE:SR:GQ:SQ:GL:DP:RO:AO:QR:QA:RS:AS:ASC:RP:AP:AB 0/1:4:0:4:9:9.08:-14,-13,-59:78:69:8:69:8:69:5:2:0:0:0.1 ./.:0:0:0:.:.:.:.:.:.:.:.:.:.:.:.:.:. ./.:1:0:1:.:.:.:.:.:.:.:.:.:.:.:.:.:.
Note that only one sample is genotyped. This is the case for all variants in the VCF. Only the first sample is genotyped.
I have used these commands successfully in the past with hg19 aligned genomes without error. I'm not sure what's causing the error messages I'm receiving.
- Why am I receiving the
-S is deprecated
error? - Why only one sample is genotyped?
- Is there a hg38 exclusion file that you recommend?
Thank you for your time
Hi, We are trying to use Speedseq SV call for our whole genome analysis pipeline. We processed the HapMap sample NA12877. I am getting the same warning when I run speedseq sv on the bam outputs from speedseq align "Warning: --split_bam (-S) is deprecated. Ignoring NA12877_10X.splitters.bam." Is this a known issue?
Thank you, Sithara
Hi,
I met the same warning issue. I am wondering whether it's a big issue for CNV calling.
Thanks!
Elaine