vg
vg copied to clipboard
Mapping very short reads with vg giraffe
Hi vg team!
I'm trying to use vg giraffe for mapping very short paired-end reads (36bp) against a pangenome.
I tried running this command line
> vg giraffe -M 10 -x ${panIdxPath}/${panPre}.xg -g ${panIdxPath}/${panPre}.gg \
-H ${panIdxPath}/${panPre}.gbwt -L 2000 -m ${panIdxPath}/${panPre}.min \
-d ${panIdxPath}/${panPre}.dist -p -f ${fastq1} -f ${fastq2} -t 30 1> ${outPre}.gam \
--report-name ${log} 2> ${log}_run.log
The program didn't produce an error but it returned single-end alignments since it failed to pair reads.
warning[vg::giraffe]: Encountered 100000 ambiguously-paired reads before finding enough
unambiguously-paired reads to learn fragment length distribution. Are you sure
your reads are paired and your graph is not a hairball?
warning[vg::giraffe]: Finalizing fragment length distribution before reaching maximum sample size
mapped 0 reads single ended with 100000 pairs of reads left unmapped
mean: 0, stdev: 1
warning[vg::giraffe]: Cannot cluster reads with a fragment distance smaller than read distance
Fragment length distribution: mean=0, stdev=1
Fragment distance limit: 2, read distance limit: 200
warning[vg::giraffe]: Falling back on single-end mapping
Using fragment length estimate: 0 +/- 1
I tried re-running vg giraffe setting different values for some parameters I understood are related to read length:
- --distance-limit 36
- --distance-limit 86
- --paired-distance-limit 3
- --rescue-algorithm gssw
but in all cases, I obtained the same results.
I also tried to manually set --fragment-mean 86 and --fragment-stdev 80. In this case, warning messages didn't appear in the log file, instead it reports the defined values were used.
Using fragment length estimate: 86 +/- 80
However, when I checked the stats (samtools flagstats
) for the bam file obtained with vg surject
, the percentage of properly paired reads is 0%
> vg surject -b -i -t 30 -x ${panIdxPath}/${panPre}.xg ${prefix}.gam 1> ${prefix}.bam
> samtools flagstat -@30 ${prefix}.bam
326331274 + 0 in total (QC-passed reads + QC-failed reads)
0 + 0 secondary
0 + 0 supplementary
0 + 0 duplicates
0 + 0 mapped (0.00% : N/A)
326331274 + 0 paired in sequencing
163165637 + 0 read1
163165637 + 0 read2
0 + 0 properly paired (0.00% : N/A)
0 + 0 with itself and mate mapped
0 + 0 singletons (0.00% : N/A)
0 + 0 with mate mapped to a different chr
0 + 0 with mate mapped to a different chr (mapQ>=5)
I read the documentation and wiki page but I didn't find a way to fix it. I also tried to look for a parameter for reducing the seed value (default 29) but I didn't find it.
Is there any limitation in the vg giraffe implementation for the minimum read length for paired-end experiments? I ran it using 76PE and 101PE and it worked fine. If it is not, can you please recommend me alternatives for being able to get alignments for properly paired reads for this experiment?
Thank you for your help!
Gabriela
By default, Giraffe uses k = 29, w = 11 minimizers. If the reads are shorter than the minimizer window (k+w-1 bp), there won't be any seeds. You can adjust the minimizer parameters by building a new minimizer index. For example:
vg minimizer -p -t 16 -k 21 -w 11 -d graph.dist -g graph.gbwt -o graph.min graph.gg
Hi @jltsiren! Thank you!
I was able to build this new minimizer index. Now, for running vg giraffe
for my PE 36bp reads do you recommend using default parameter values or considering some of the alternatives I was trying before like:
- --distance-limit 36
- --distance-limit 86
- --paired-distance-limit 3
- --rescue-algorithm gssw
Thanks in advance!
I would try the default parameters first. Nobody has much experience with using Giraffe with very short reads, but I'd expect that if any issues arise, they will be about not having enough seeds.
You may have to try different minimizer parameters if the fraction of mapped reads is lower than expected.
Hi @jltsiren!
Thanks for your suggestions. I created a new minimizers index with -k 15 -w 5
and then used it to map my PE 36bp reads running this command:
vg giraffe -M 10 -x ${panIdxPath}/${panPre}.xg -g ${panIdxPath}/${panPre}.gg -H ${panIdxPath}/${panPre}.gbwt \
-L 2000 -m ${panIdxPath}/${panPre}_k15w5.min -d ${panIdxPath}/${panPre}.dist -p -f ${fastq1} -f ${fastq2} -t 40 \
-D 86 1> ${outPre}.gam --report-name ${log} 2> ${log}_run.log
It worked for most replicates but for one of them, I'm having an error in the *run.log
report.
Guessing that hprc-v1.0-mc-chm13-minaf.0.1.giraffe.gbz is Giraffe GBZ
Preparing Indexes
Loading Minimizer Index
Loading GBZ
Loading Distance Index v2
Paging in Distance Index v2
Initializing MinimizerMapper
Loading and initialization: 195.263 seconds
Of which Distance Index v2 paging: 6.36955 seconds
Mapping reads to "-" (GAM)
--watchdog-timeout 10
--max-multimaps 10
--hit-cap 10
--hard-hit-cap 500
--score-fraction 0.9
--max-min 500
--num-bp-per-min 1000
--distance-limit 86
--max-extensions 800
--max-alignments 8
--cluster-score 50
--pad-cluster-score 20
--cluster-coverage 0.3
--extension-score 1
--extension-set 20
--rescue-attempts 15
--max-fragment-length 2000
--paired-distance-limit 2
--rescue-subgraph-size 4
--rescue-seed-limit 100
--chaining-cluster-distance 100
--precluster-connection-coverage-threshold 0.3
--min-precluster-connections 10
--max-precluster-connections 50
--max-lookback-bases 100
--min-lookback-items 1
--lookback-item-hard-cap 15
--chain-score-threshold 100
--min-chains 1
--chain-min-score 100
--max-chain-connection 100
--max-tail-length 100
--max-dp-cells 16777216
--rescue-algorithm dozeu
Using fragment length estimate: 75.6621 +/- 40.152
vg: src/dozeu_interface.cpp:126: vg::DozeuInterface::graph_pos_s vg::DozeuInterface::calculate_max_position(const vg::DozeuInterface::OrderedGraph&, const vg::DozeuInterface::graph_pos_s&, size_t, bool, const std::vector<const dz_forefront_s*>&): Assertion `forefronts.at(max_node_index)->mcap != nullptr' failed.
━━━━━━━━━━━━━━━━━━━━
Crash report for vg v1.48.0 "Gallipoli"
Stack trace (most recent call last) in thread 181304:
#20 Object "", at 0xffffffffffffffff, in
#19 Object "/homes/gmerino/anaconda3/envs/vg/bin/vg", at 0x1f8f7f2, in __clone
#18 Object "/homes/gmerino/anaconda3/envs/vg/bin/vg", at 0x1492898, in start_thread
#17 Object "/homes/gmerino/anaconda3/envs/vg/bin/vg", at 0x1ec3ffd, in gomp_thread_start
#16 Object "/homes/gmerino/anaconda3/envs/vg/bin/vg", at 0x118d73f, in unsigned long vg::io::paired_for_each_parallel_after_wait<vg::Alignment>(std::function<bool (vg::Alignment&, vg::Alignment&)>, std::function<void (vg::Alignment&, vg::Alignment&)>, std::function<bool ()>, unsigned long) [clone ._omp_fn.0]
#15 Object "/homes/gmerino/anaconda3/envs/vg/bin/vg", at 0x1ec68d7, in gomp_team_barrier_wait_end
#14 Object "/homes/gmerino/anaconda3/envs/vg/bin/vg", at 0x1ebdf9b, in gomp_barrier_handle_tasks
#13 Object "/homes/gmerino/anaconda3/envs/vg/bin/vg", at 0x118d8ea, in unsigned long vg::io::paired_for_each_parallel_after_wait<vg::Alignment>(std::function<bool (vg::Alignment&, vg::Alignment&)>, std::function<void (vg::Alignment&, vg::Alignment&)>, std::function<bool ()>, unsigned long) [clone ._omp_fn.1]
#12 Object "/homes/gmerino/anaconda3/envs/vg/bin/vg", at 0xd83600, in std::_Function_handler<void (vg::Alignment&, vg::Alignment&), main_giraffe(int, char**)::{lambda()#1}::operator()() const::{lambda(vg::Alignment&, vg::Alignment&)#6}>::_M_invoke(std::_Any_data const&, vg::Alignment&, vg::Alignment&)
#11 Object "/homes/gmerino/anaconda3/envs/vg/bin/vg", at 0xeeeb94, in vg::MinimizerMapper::map_paired(vg::Alignment&, vg::Alignment&, std::vector<std::pair<vg::Alignment, vg::Alignment>, std::allocator<std::pair<vg::Alignment, vg::Alignment> > >&)
#10 Object "/homes/gmerino/anaconda3/envs/vg/bin/vg", at 0xee958b, in vg::MinimizerMapper::map_paired(vg::Alignment&, vg::Alignment&)
#9 Object "/homes/gmerino/anaconda3/envs/vg/bin/vg", at 0xedd4ba, in void vg::MinimizerMapper::process_until_threshold_c<double>(unsigned long, std::function<double (unsigned long)> const&, std::function<bool (unsigned long, unsigned long)> const&, double, unsigned long, unsigned long, vg::LazyRNG&, std::function<bool (unsigned long)> const&, std::function<void (unsigned long)> const&, std::function<void (unsigned long)> const&) const [clone .constprop.0]
#8 Object "/homes/gmerino/anaconda3/envs/vg/bin/vg", at 0xef0ae8, in vg::MinimizerMapper::map_paired(vg::Alignment&, vg::Alignment&)::{lambda(unsigned long)#16}::operator()(unsigned long) const
#7 Object "/homes/gmerino/anaconda3/envs/vg/bin/vg", at 0xeefacc, in vg::MinimizerMapper::attempt_rescue(vg::Alignment const&, vg::Alignment&, vg::VectorView<vg::MinimizerMapper::Minimizer> const&, bool)
#6 Object "/homes/gmerino/anaconda3/envs/vg/bin/vg", at 0xfb63d2, in vg::Aligner::align_xdrop(vg::Alignment&, handlegraph::HandleGraph const&, std::vector<handlegraph::handle_t, std::allocator<handlegraph::handle_t> > const&, std::vector<vg::MaximalExactMatch, std::allocator<vg::MaximalExactMatch> > const&, bool, unsigned short) const
#5 Object "/homes/gmerino/anaconda3/envs/vg/bin/vg", at 0x121d9ab, in vg::DozeuInterface::align(vg::Alignment&, handlegraph::HandleGraph const&, std::vector<handlegraph::handle_t, std::allocator<handlegraph::handle_t> > const&, std::vector<vg::MaximalExactMatch, std::allocator<vg::MaximalExactMatch> > const&, bool, signed char, unsigned short)
#4 Object "/homes/gmerino/anaconda3/envs/vg/bin/vg", at 0x121b9d1, in vg::DozeuInterface::calculate_max_position(vg::DozeuInterface::OrderedGraph const&, vg::DozeuInterface::graph_pos_s const&, unsigned long, bool, std::vector<dz_forefront_s const*, std::allocator<dz_forefront_s const*> > const&)
#3 Object "/homes/gmerino/anaconda3/envs/vg/bin/vg", at 0x1ee7555, in __assert_fail
#2 Object "/homes/gmerino/anaconda3/envs/vg/bin/vg", at 0x5bfed7, in __assert_fail_base.cold
#1 Object "/homes/gmerino/anaconda3/envs/vg/bin/vg", at 0x5c0007, in abort
#0 Object "/homes/gmerino/anaconda3/envs/vg/bin/vg", at 0x149611b, in raise
ERROR: Signal 6 occurred. VG has crashed. Visit https://github.com/vgteam/vg/issues/new/choose to report a bug.
Please include this entire error log in your bug report!
━━━━━━━━━━━━━━━━━━━━
error[MinimizerMapper::faster_cap]: Minimizers seem impossible to disrupt in region 16 17 0 6
━━━━━━━━━━━━━━━━━━━━
Crash report for vg v1.48.0 "Gallipoli"
Stack trace (most recent call last) in thread 181310:
#16 Object "", at 0xffffffffffffffff, in
Can you please indicate what is the source of this error?
Thanks for your help!
I am running into this same issue, have you been able to figure it out?