varianttools icon indicating copy to clipboard operation
varianttools copied to clipboard

Stop combining variants in `vtools export`

Open BoPeng opened this issue 7 years ago • 1 comments

I have those two variants in my vtools variant database:

4       106156653       T       C                       Scan1,Scan2                                                ....,.,.,.,....
4       106156653       T       G                       Scan1,Scan2                                                ....,.,.,.,....

So, when I export it to vcf with the following command

vtools export variant --format $SCRIPTS/myvcf.fmt --header CHROM POS ID REF ALT QUAL FILTER INFO --var_info callers genotypes --output ./Variants_raw.vcf

These variants will be combined to a multi-allelic entry like this:

4    106156653 .    T    C,G  .    PASS callers=[u'Scan1|Scan2', u'Scan1|Scan2'];genotypes=[u'....|.|.|.|....', u'....|.|.|.|....']

This is very bad – for one, because the further processing gets corrupted by the MAV and these strange [] arrays are also difficult to process. I would prefer it to output just one line per each variant, just as it would be done via vtools export.

Surely there will be a nice little workaround for this, I assume… But I seem not to be able to find it already…

So, can you help me with this another time?

BoPeng avatar Dec 07 '17 15:12 BoPeng

Changing

export_by=chr,%(pos)s,%(ref)s

to

export_by=chr,%(pos)s,%(ref)s,%(alt)s

in vcf.fmt

[format description]
description=Import vcf
variant=chr,%(pos)s,%(ref)s,%(alt)s
genotype=%(geno)s
variant_info=%(var_info)s
genotype_info=%(geno_info)s
# variants with identical chr,pos,ref will be collapsed.
export_by=chr,%(pos)s,%(ref)s

should solve the problem.

BoPeng avatar Dec 08 '17 17:12 BoPeng