23andme2vcf icon indicating copy to clipboard operation
23andme2vcf copied to clipboard

Broken genotypes

Open lhelseth opened this issue 8 years ago • 2 comments

I keep getting just "0" or "1" for genotype after ~560,000 correct conversions using the 23andme_v4_hg19_ref.txt.gz data and after ~500,000 correct conversions using the 23andme_v4_hg19_ref.txt.gz data using "perl 23andme2vcf.pl <path to 23andme txt.zip file> genome_XYZ.vcf 4 (or 3)". The break happens in both after:

chrX 2689575 rs311150 G A . . . GT 1/1

I've seen this with two different individuals' SNP files, one generated in Feb 2015 and another generated Dec 2016. Both the above rsID and the following one are still listed in the current dbSNP, and both entries in the 23andme_v5_ht19_ref.txt.gz appear valid. I've run this on a laptop and on a Linux server so don't think it's a resource issue. Any suggestions? Thanks.

Larry

lhelseth avatar Dec 12 '16 18:12 lhelseth

Did you make a 23andme_v5_ht19_ref.txt.gz reference? I've included a v3 and v4. It's possible that leaving indels in a new reference would fail in that way, but I wouldn't expect it to get so far before encountering it's first indel.

arrogantrobot avatar Dec 13 '16 02:12 arrogantrobot

Rob, No, I just used the v4 (and then the v3, when prompted after running v4). Thanks.

Larry

On Mon, Dec 12, 2016 at 8:51 PM, Rob Long [email protected] wrote:

Did you make a 23andme_v5_ht19_ref.txt.gz reference? I've included a v3 and v4. It's possible that leaving indels in a new reference would fail in that way, but I wouldn't expect it to get so far before encountering it's first indel.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/arrogantrobot/23andme2vcf/issues/15#issuecomment-266621067, or mute the thread https://github.com/notifications/unsubscribe-auth/ALKFYSH70WwFeyOzePiwgqmEQYdUvycJks5rHggwgaJpZM4LK3Rx .

lhelseth avatar Dec 13 '16 13:12 lhelseth