Kevin Arvai

Results 22 comments of Kevin Arvai

The best file type would be a multi sample vcf. Each record in the vcf would have genotypes listed for each of the 3,202 genomes at that locus. Breaking these...

I tried a couple of queries with what I thought would be the minimum fields needed for comparison. However, I had limited success. Here's what I did: 1. LIfted over...

Hey Daniel, I was able to query Athena, will compare the genotypes this weekend! ```sql SELECT * FROM var_nested WHERE (chrom, pos) IN (('chr1', 101244007), ('chr1', 151150013), ('chr1', 159204893), ('chr2',...

Hi @dbrami -- sorry for the delayed response, I came back to look at this and am finally getting somewhere. See [this notebook](https://github.com/arvkevi/ezancestry/blob/5526c5ef1e1482258dd47972c76310a24e765f01/dragen_comparison.ipynb) on the `kevin/dragen` branch for initial results...

Hi @dbrami I will assume that NaN's are 0|0's and train model on the Dragen data then compare the performance of the two models with 5-fold CV. I'll try to...

Hey @dbrami I've updated the analysis [here](https://github.com/arvkevi/ezancestry/blob/kevin/dragen/dragen_comparison.ipynb). The 5-fold cv performance (log loss) is better when using the 1kG data compared to Dragen. It looks like the Dragen snps had...

Hi @RosaDeSa 👋🏼 were you able to figure out what the issue was? If so, it could be helpful for others if you share your solution. I'm unsure how ezancestry...

Ezancestry uses snps to read vcfs in [process.py](https://github.com/arvkevi/ezancestry/blob/c66fc37335e5b505b6ce7b7581c2a1284c80d311/ezancestry/process.py#LL258C44-L258C44). Are the two samples related? Do they have the exact same set of AISNPs?

Hey @RosaDeSa, one other thing that could be contributing to this is having too many missing AISNPs in the vcf. When you call predict, it should log a message indicating...

Hmm, the merge is on both rsid AND position. Unfortunately, this requires vcf annotated with rsids and for the position to match the hg19 positions from the .aisnps files. You...