snps
snps copied to clipboard
AttributeError: 'str' object has no attribute '_output_dir'
I'm using aws ec2 ubuntu. It does not allow me to create an individual.
user662 = l.create_individual('User662', '/home/ubuntu/myprojectdir/AaronAzuma.zip') Traceback (most recent call last): File "
", line 1, in File "/home/ubuntu/myprojectdir/venv/lib/python3.8/site-packages/lineage/init.py", line 96, in create_individual return Individual(name, raw_data, self._output_dir, **kwargs) AttributeError: 'str' object has no attribute '_output_dir'
Thanks for the issue. Can you provide more details or code snippets? I just tested installing and running the README
examples in a Python 3.8 virtual environment without any issues.
Thanks Andrew,
On using your example data and the create_individual working, I realized that my issue was with the parsing. I already converted the format from AncestryDNA to 23andMe and then tried to use create_indidual. I receive the parsing error, which then doesn't allow me to go forward. My other set of files also have 4 columns like 23andMe but no headers (from the H3Africa array with another lab).
$ sed -n 1,20p lineage/inputs/myfile.txt #AncestryDNA raw data download #This file was generated by AncestryDNA at: 07/31/2018 23:48:22 UTC #Data was collected using AncestryDNA array version: V2.0 #Data is formatted using AncestryDNA converter version: V1.0 ... rsid chromosome position allele1allele2 rs369202065 1 569388 GG
$ python manage.py shell Python 3.8.5 (default, Jul 28 2020, 12:59:40) [GCC 9.3.0] on linux
from lineage import Lineage l = Lineage()
user111 = l.create_individual('User111', 'myfile.txt') pandas.errors.ParserError: Too many columns specified: expected 5 and found 4
LaKisha
On Sun, Jan 17, 2021 at 11:14 PM Andrew Riha [email protected] wrote:
Thanks for the issue. Can you provide more details or code snippets? I just tested installing and running the README examples in a Python 3.8 virtual environment without any issues.
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/apriha/lineage/issues/84#issuecomment-761984850, or unsubscribe https://github.com/notifications/unsubscribe-auth/ALHHGO657CSAN6PJROW3PCTS2O7SFANCNFSM4WGHH4IQ .
Thanks LaKisha, that helps. lineage
uses the snps
library to parse files, so I transferred the issue here.
snps
should be able to read raw AncestryDNA or 23andMe files without conversion... However, snps
could be updated to handle the format you pasted as well. Do you have a link to the tool that produces that format?
As for the H3Africa files, can you confirm that an example file would look like this (tab-separated):
rs1 1 101 AA
rs2 1 102 CC
rs3 1 103 GG
rs4 1 104 TT
rs5 1 105 --
rs6 1 106 GC
rs7 1 107 TC
rs8 1 108 AT
.
.
.
Hi Andrew,
Here is the script I'm using to convert my files from AncestryDNA to 23andMe format:
(venv) ubuntu@:~/myprojectdir/lineage/inputs$ for file in ./*.txt; do echo "converting from AncestryDNA to 23andMe format file:" $file; gawk -i inplace -F'\t' '{ print $1"\t"$2"\t"$3"\t"$4$5; }' $file; done
This line results in a text file that looks like this:
rsid chromosome position allele1allele2 rs369202065 1 569388 GG rs199476136 1 569400 TT rs3131972 1 752721 AG rs114525117 1 759036 GG rs12124819 1 776546 AA rs4040617 1 779322 AA rs141175086 1 780397 CC rs115093905 1 787173 GG rs11240777 1 798959 AG
The H3Africa file looks like this after using the command line (tab): h3a_37_1_54676_C_T 1 54676 AA seq-h3a_37_1_61989_G_C 1 61989 CC seq-h3a_37_1_62271_A_G 1 62271 AA seq-h3a_37_1_64552_G_A 1 64552 AA seq-h3a_37_1_104072_C_T 1 104072 GG h3a_37_1_108310_T_C 1 108310 AA h3a_37_1_110509_G_A 1 110509 GG seq-h3a_37_1_118617_T_C 1 118617 GG seq-h3a_37_1_256586_T_G 1 256586 AC h3a_37_1_404672_G_A 1 404672 AA kgp15717912 1 534247 GG
If it helps, I'm sharing with you that after converting to 23andMe format, I convert it to VCF format to use downline. Your tool is really quick, plus the graph. It would be great if I could use it my pipeline. Here's my 23andMe to VCF conversion:
(venv) ubuntu@:~/myprojectdir/lineage/inputs$ for file in ./*txt; do echo "converting to vcf file:" $file; bcftools convert -c ID,CHROM,POS,AA -s ${file%.txt} --haploid2diploid -f ../references/Homo_sapiens.GRCh37.75.dna.primary_assembly.fa --tsv2vcf $file -Oz -o ${file%.txt}.vcf.gz; done
Index multiple vcf files in prep to merge
for file in ./*.vcf.gz; do echo "indexing vcf file" $file; tabix $file; done
Merge multiple vcf file into single vcf file
bcftools merge -Oz -o MergedSamples1.vcf.gz ../inputs/*.vcf.gz
Clean MergedSamples file
bgzip -d ../results/MergedSamples.vcf.gz grep ^"#" ../results/MergedSamples.vcf > ../results/MergedSamples0.vcf awk -F$'\t' '{ if ( $3 ~ "rs" ) { print $0; } }' ../results/MergedSamples.vcf > ../results/MergedSamples1.vcf awk -F$'\t' '{ if ( $3 !~ ";" ) { print $0; } }' ../results/MergedSamples1.vcf > ../results/MergedSamples2.vcf cat ../results/MergedSamples0.vcf ../results/MergedSamples2.vcf > ../results/MergedSamplesEdited.vcf sed -n 1,20p MergedSamplesEdited.vcf gawk -i inplace '!a[$2]++' ../results/MergedSamplesEdited.vcf bgzip ../results/MergedSamplesEdited.vcf
On Mon, Jan 18, 2021 at 11:34 PM Andrew Riha [email protected] wrote:
Thanks LaKisha, that helps. lineage uses the snps library to parse files, so I transferred the issue here.
snps should be able to read raw AncestryDNA or 23andMe files without conversion... However, snps could be updated to handle the format you pasted as well. Do you have a link to the tool that produces that format?
As for the H3Africa files, can you confirm that an example file would look like this (tab-separated):
rs1 1 101 AA rs2 1 102 CC rs3 1 103 GG rs4 1 104 TT rs5 1 105 -- rs6 1 106 GC rs7 1 107 TC rs8 1 108 AT .. . ..
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/apriha/snps/issues/120#issuecomment-762612276, or unsubscribe https://github.com/notifications/unsubscribe-auth/ALHHGO3YAISSB3V4FHC7HRLS2UKWNANCNFSM4WIHI47A .
Thanks LaKisha. The issue with snps
/ lineage
not being able to parse your converted file is because it's trying to apply the AncestryDNA parser based on the comments, and for that it looks for whitespace between the alleles and column headers.
But, you don't need to convert the file since snps
can read AncestryDNA (and the other formats discussed in the README already. Give that a try and let me know how it works.
As for the H3Africa file, snps
should also be able to read that.
And if you need a VCF file, you can save the SNPs in VCF format.
Closing since there are no updates required for this issue.
Sorry, I closed the issue too early. Upon further investigation, snps
should be updated to handle the H3Africa format since the generic parser is not invoked (an rsid is not in the first line). Also, the generic parser wouldn't be able to parse this due to multiple whitespace.
So to handle this, snps
could either (or both)
- check if "h3a" is in the first line and apply a parser similar to the AncestryDNA parser with multiple whitespace
- apply a generic parser as a last check that tries to read four or five column files with multiple whitespace
Hi Andrew, I tried again with fresh AncestryDNA zip files. I'm still getting the same error message.
s = SNPs("/home/ubuntu/myprojectdir/lineage/inputs/Person1.zip") s.source 'AncestryDNA' s.build 37 s.assembly 'GRCh37' s.count Traceback (most recent call last): File "
", line 1, in AttributeError: 'SNPs' object has no attribute 'count' user662 = l.create_individual('User662', '/home/ubuntu/myprojectdir/lineage/inputs/Person1.zip') Traceback (most recent call last): File " ", line 1, in File "/home/ubuntu/myprojectdir/venv/lib/python3.8/site-packages/lineage/init.py", line 96, in create_individual return Individual(name, raw_data, self._output_dir, **kwargs) AttributeError: 'str' object has no attribute '_output_dir'
On Sun, Jan 24, 2021 at 10:44 PM Andrew Riha [email protected] wrote:
Sorry, I closed the issue too early. Upon further investigation, snps should be updated to handle the H3Africa format since the generic parser is not invoked (an rsid is not in the first line). Also, the generic parser wouldn't be able to parse this due to multiple whitespace.
So to handle this, snps could either (or both)
- check if "h3a" is in the first line and apply a parser similar to the AncestryDNA parser with multiple whitespace
- apply a generic parser as a last check that tries to read four or five column files with multiple whitespace
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/apriha/snps/issues/120#issuecomment-766536123, or unsubscribe https://github.com/notifications/unsubscribe-auth/ALHHGO5WMVIZRNBAODDLNDLS3TZJ5ANCNFSM4WIHI47A .
Hi @lakishadavid , please try to create a new virtual environment and install lineage
again - I've updated it to support the latest version of snps
. FYI, here are some additional installation directions: https://lineage.readthedocs.io/en/latest/installation.html .