vg icon indicating copy to clipboard operation
vg copied to clipboard

vg construct unsupported variant allele

Open yihangs opened this issue 1 year ago • 4 comments

I use vg (v1.39.0) construct on a vcf file (VCFv4.2) and hg19 but got the following warning: "warning:[vg::Constructor] Unsupported variant allele <DEL>; Skipping variant(s)".

After some simple debugging, I find that it seems that vg cannot support vcf files with ALTs represented by IDs, such as: Screen Shot 2022-07-13 at 10 59 30 AM

But will support vcf files with ALTs represented by sequences, such as: Screen Shot 2022-07-13 at 11 00 58 AM

Since the former one is a fairly common vcf format, I wonder if there is some way we can do to make vg constrcut also support that kind of vcf file.

Thanks!

yihangs avatar Jul 13 '22 15:07 yihangs

Have you tried vg construct -S -f ?

glennhickey avatar Jul 28 '22 13:07 glennhickey

I tried vg construct -S -f, and find that no unsupported variant allele warning appeared anymore. I checked the output .vg file, and find that although this can help encode some ALTs such as <DEL> and <INV>, it is still not able to support all, e.g. <DUP:TANDEM>. More importantly, it seems that vg cannot support all ALTs that are represented by breakends such as G]chr13:81121773], meaning most complex structural variants and inter-chromosomal translocations are removed. Since I am trying to use vg to map reads from some cancer cell types, inter-chromosomal structural variants are actually very important to me. Is there any way to construct a genome graph that has inter-chromosomal SVs? (vg command line or other genome graph construction softwares). Thank you!

yihangs avatar Aug 09 '22 16:08 yihangs

Yeah, vg construct -S only supports <DEL>, <INV> and <INS>. It would be nice to support the full VCF spec, but I don't think anyone's working on that now. Most of the development is going towards building graphs directly from assemblies without going through VCF. And while I hope to see pangenome graphs applied to cancer genomes in the future, as far as I know most of the work on whole-genome graphs at present is still focused on germline variation:

minigraph doesn't, as far as I know, presntly support constructing graphs with any inter-chrmosomal events. minigraph-cactus could in theory add some in (if you do not run cactus-graphmap-split) but this hasn't been well tested cactus (without minigraph) may also work if you have very few samples (and all assemblies are run through RepeatMasker first),

but PGGB is probably your best bet, provided you don't want too many samples in your graph (it also relies on chromosome splitting to help scale but can be run without). Also, it may be difficult to map reads to the graph with current tools..

glennhickey avatar Aug 12 '22 14:08 glennhickey

Thank you for your reply! So vg construct also does not support <DUP> (not DUP:TANDEM)?

yihangs avatar Aug 12 '22 15:08 yihangs

Hi contributors, I face similar questions that vg is not able to merge <DUP> and <BND> into graph.vg after using command vg construct -S. I wonder if is there any progress on this problem as I notice this issue is not closed.

Hi @yihangs, I would like to establish a connection with you as I am very interested in your research if you like. Could you email me if you accept my offer? Touch me by email [email protected]

maxineliu avatar Feb 08 '23 17:02 maxineliu