vg
vg copied to clipboard
Mapping very short reads with vg giraffe
Hi vg team!
I'm trying to use vg giraffe for mapping very short paired-end reads (36bp) against a pangenome.
I tried running this command line
> vg giraffe -M 10 -x ${panIdxPath}/${panPre}.xg -g ${panIdxPath}/${panPre}.gg \
-H ${panIdxPath}/${panPre}.gbwt -L 2000 -m ${panIdxPath}/${panPre}.min \
-d ${panIdxPath}/${panPre}.dist -p -f ${fastq1} -f ${fastq2} -t 30 1> ${outPre}.gam \
--report-name ${log} 2> ${log}_run.log
The program didn't produce an error but it returned single-end alignments since it failed to pair reads.
warning[vg::giraffe]: Encountered 100000 ambiguously-paired reads before finding enough
unambiguously-paired reads to learn fragment length distribution. Are you sure
your reads are paired and your graph is not a hairball?
warning[vg::giraffe]: Finalizing fragment length distribution before reaching maximum sample size
mapped 0 reads single ended with 100000 pairs of reads left unmapped
mean: 0, stdev: 1
warning[vg::giraffe]: Cannot cluster reads with a fragment distance smaller than read distance
Fragment length distribution: mean=0, stdev=1
Fragment distance limit: 2, read distance limit: 200
warning[vg::giraffe]: Falling back on single-end mapping
Using fragment length estimate: 0 +/- 1
I tried re-running vg giraffe setting different values for some parameters I understood are related to read length:
- --distance-limit 36
- --distance-limit 86
- --paired-distance-limit 3
- --rescue-algorithm gssw
but in all cases, I obtained the same results.
I also tried to manually set --fragment-mean 86 and --fragment-stdev 80. In this case, warning messages didn't appear in the log file, instead it reports the defined values were used.
Using fragment length estimate: 86 +/- 80
However, when I checked the stats (samtools flagstats
) for the bam file obtained with vg surject
, the percentage of properly paired reads is 0%
> vg surject -b -i -t 30 -x ${panIdxPath}/${panPre}.xg ${prefix}.gam 1> ${prefix}.bam
> samtools flagstat -@30 ${prefix}.bam
326331274 + 0 in total (QC-passed reads + QC-failed reads)
0 + 0 secondary
0 + 0 supplementary
0 + 0 duplicates
0 + 0 mapped (0.00% : N/A)
326331274 + 0 paired in sequencing
163165637 + 0 read1
163165637 + 0 read2
0 + 0 properly paired (0.00% : N/A)
0 + 0 with itself and mate mapped
0 + 0 singletons (0.00% : N/A)
0 + 0 with mate mapped to a different chr
0 + 0 with mate mapped to a different chr (mapQ>=5)
I read the documentation and wiki page but I didn't find a way to fix it. I also tried to look for a parameter for reducing the seed value (default 29) but I didn't find it.
Is there any limitation in the vg giraffe implementation for the minimum read length for paired-end experiments? I ran it using 76PE and 101PE and it worked fine. If it is not, can you please recommend me alternatives for being able to get alignments for properly paired reads for this experiment?
Thank you for your help!
Gabriela