vg icon indicating copy to clipboard operation
vg copied to clipboard

VG Construct fails: alignment does not start with match over padded sequence

Open cademirch opened this issue 5 years ago • 5 comments

I have constructed graphs previously without encountering this issue. Any ideas or solutions? Thanks in advance.

My command looks like this: vg construct -r ref.fna -v sample.vcf.gz > sample.vg

And this is the full error: warning:[vg::Constructor] Lowercase characters found in NC_004354.4; coercing to uppercase. parsedAlternates: alignment does not start with match over padded sequence 15M4I9M1S ZZZZZZZZZZQTTTZZZZZZZZZZ ZZZZZZZZZZQNON_REF>ZZZZZZZZZZ

cademirch avatar Apr 10 '19 21:04 cademirch

It looks like your VCF has the string NON_REF (maybe <NON_REF>?) somewhere in an alt. We can't deal with that variant; we can only deal with variants that actually specify the alternate allele fully.

Try dropping the variant with something like:

zcat sample.vcf.gz | grep -v "NON_REF" > sample.clean.vcf
bgzip sample.clean.vcf
tabix -p vcf sample.clean.vcf.gz

We really ought to produce a better error message than this when we encounter this situation, though. The text of that symbolic alt is getting into the gears of the vcflib align-the-alts-to-the-ref code and exploding, instead of being caught earlier.

adamnovak avatar Apr 11 '19 22:04 adamnovak

Hi @adamnovak

I got a similar error message but I don't have NON_REF in my error message ( I skipped some points since it was too long)

Restricting to chr22 from 1 to end
building graph for chr22

parsedAlternates: alignment does not start with match over padded sequence                                              
71400S
ZZZ...(skip)...ZZQZZ...(skip)...ZZZ
ZZZ...(skip)...ZZQTGG...(skip)...CAGZZZ...(skip)...ZZZ

I also checked that there were only ATGC in the ALT sequence region by the following command

cat error_message | grep [ATGC]

JYLeeBioinfo avatar Sep 17 '19 07:09 JYLeeBioinfo

This is a horrible hack in vcflib, where I'm trying to fake a global alignment using these weird characters in the allele strings.

We should change this to use one of the alignment methods in vg. One ideal one would be the banded global aligner, or maybe xdrop. Either would be better than this hack.

You can avoid this by setting --flat-alts in vg construct.

ekg avatar Sep 17 '19 08:09 ekg

Thank you for kind reply, Erik!

This error was gone after applying --flat-alts

JYLeeBioinfo avatar Sep 17 '19 08:09 JYLeeBioinfo

We are still working on this. An update to vcflib will eliminate this error.

On Wed, Apr 27, 2022, 04:50 Cade Mirchandani @.***> wrote:

Closed #2208 https://github.com/vgteam/vg/issues/2208.

— Reply to this email directly, view it on GitHub https://github.com/vgteam/vg/issues/2208#event-6503330631, or unsubscribe https://github.com/notifications/unsubscribe-auth/AABDQEL75LKDQWOAFX7FZ2TVHCTP7ANCNFSM4HFBKIFA . You are receiving this because you commented.Message ID: @.***>

ekg avatar Apr 27 '22 06:04 ekg