23andme2vcf Broken genotypes

I keep getting just "0" or "1" for genotype after ~560,000 correct conversions using the 23andme_v4_hg19_ref.txt.gz data and after ~500,000 correct conversions using the 23andme_v4_hg19_ref.txt.gz data using "perl 23andme2vcf.pl <path to 23andme txt.zip file> genome_XYZ.vcf 4 (or 3)". The break happens in both after:

chrX 2689575 rs311150 G A . . . GT 1/1

I've seen this with two different individuals' SNP files, one generated in Feb 2015 and another generated Dec 2016. Both the above rsID and the following one are still listed in the current dbSNP, and both entries in the 23andme_v5_ht19_ref.txt.gz appear valid. I've run this on a laptop and on a Linux server so don't think it's a resource issue. Any suggestions? Thanks.

Larry

Dec 12 '16 18:12 lhelseth

Did you make a 23andme_v5_ht19_ref.txt.gz reference? I've included a v3 and v4. It's possible that leaving indels in a new reference would fail in that way, but I wouldn't expect it to get so far before encountering it's first indel.

Dec 13 '16 02:12 arrogantrobot

Rob, No, I just used the v4 (and then the v3, when prompted after running v4). Thanks.

Larry

On Mon, Dec 12, 2016 at 8:51 PM, Rob Long [email protected] wrote:

Did you make a 23andme_v5_ht19_ref.txt.gz reference? I've included a v3 and v4. It's possible that leaving indels in a new reference would fail in that way, but I wouldn't expect it to get so far before encountering it's first indel.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/arrogantrobot/23andme2vcf/issues/15#issuecomment-266621067, or mute the thread https://github.com/notifications/unsubscribe-auth/ALKFYSH70WwFeyOzePiwgqmEQYdUvycJks5rHggwgaJpZM4LK3Rx .

Dec 13 '16 13:12 lhelseth

23andme2vcf 23andme2vcf copied to clipboard

Broken genotypes

23andme2vcf
23andme2vcf copied to clipboard