snpy
snpy copied to clipboard
VCF Files
HI ,
I downloaded the last version of the bulk of openSNP. It's actually 4700 files of raw data from various companies. I tested your library with all the openSNP file and they're is some file with I tested them with python 3.7.
user6020_file4548_yearofbirth_unknown_sex_unknown.ancestry.txt
I also installed pyVCF parser
Here the Traceback outputed by python using a file of one user:
user6020_file4548_yearofbirth_unknown_sex_unknown.ancestry.txt
Traceback (most recent call last):
File "/home/yoan/Bureau/ADN/openSNP_rawdata/opensnp_datadump.current/parser.py", line 13, in <module>
for i in snps:
File "/usr/local/lib/python3.7/dist-packages/sn.py", line 78, in _23andme_ancestry
for row in handle:
File "/usr/lib/python3.7/csv.py", line 112, in __next__
row = next(self.reader)
File "/usr/lib/python3.7/codecs.py", line 322, in decode
(result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xbb in position 12: invalid start byte
the code:
import sn
import os
folder_content = os.listdir(os.getcwd())
for i in folder_content:
print(i)
if os.path.isfile(i):
#try:
snps = sn.parse(i)
cpt = 0
for i in snps:
print(i)
cpt += 1
if cpt == 10:
break
Thanks in advance for our help !
I just ran into the same problem. The issue is that some of the files are compressed. You can use zcat to decompress them.