Andrew Riha comments

Results 35 comments of


                                            Andrew Riha

Consolidate code to resolve SNP issues

Alternatively, I think it could be possible to read the gzip and post-process it into compressed pickles, one for each chromosome (perhaps more chunks if these are still too large)....

Consolidate code to resolve SNP issues

So with the [NCBI Variation Services](https://api.ncbi.nlm.nih.gov/variation/v0/) API, requests are limited to one / second. I think the following two endpoints could be used to resolve issues: 1. [/vcf/file/set_rsids](https://api.ncbi.nlm.nih.gov/variation/v0/#/VCF/post_vcf_file_set_rsids) This could...

Consolidate code to resolve SNP issues

> In our use case, for example, we actually use `snps` within a Lambda function which has no persistent memory. Would there be any way to build the resources into...

Assign SNPs on chromosome 0

Yeah, I agree that having a resource like discussed in #19 to lookup SNPs would help here as well and would reduce repeated API calls. I'll start looking into this...

Multisample VCF

Thanks for the issue! Yes, I think a multi-sample VCF could be constructed from merged files. To do this, I'm envisioning the `SNPs` object maintaining the results of each merge...

Use Pandas nullable integer for position in for normalized snps dataframe

Interesting. Yes a nullable integer dtype would be good to handle these cases. But let's go with `pd.UInt32Dtype()`, which would minimize memory usage.

Identify SNP array cluster

Resource available here: https://supfam.mrc-lmb.cam.ac.uk/GenomePrep/datadir/the_list.tsv.gz

Class based source readers?

Thanks @afaulconbridge, this is a great idea. Since several read functions use the same parser, I wonder if the top level `Reader` class should implement the parser functions?

Consider refactoring `genotype` column

Hi @willgdjones, what are your thoughts on this?

Consider refactoring `genotype` column

Hey @willgdjones , I started looking at this as part of #44 and plan to push some parsing updates related to this soon. I was just thinking - perhaps for...