Andrew Riha
Andrew Riha
Alternatively, I think it could be possible to read the gzip and post-process it into compressed pickles, one for each chromosome (perhaps more chunks if these are still too large)....
So with the [NCBI Variation Services](https://api.ncbi.nlm.nih.gov/variation/v0/) API, requests are limited to one / second. I think the following two endpoints could be used to resolve issues: 1. [/vcf/file/set_rsids](https://api.ncbi.nlm.nih.gov/variation/v0/#/VCF/post_vcf_file_set_rsids) This could...
> In our use case, for example, we actually use `snps` within a Lambda function which has no persistent memory. Would there be any way to build the resources into...
Yeah, I agree that having a resource like discussed in #19 to lookup SNPs would help here as well and would reduce repeated API calls. I'll start looking into this...
Thanks for the issue! Yes, I think a multi-sample VCF could be constructed from merged files. To do this, I'm envisioning the `SNPs` object maintaining the results of each merge...
Interesting. Yes a nullable integer dtype would be good to handle these cases. But let's go with `pd.UInt32Dtype()`, which would minimize memory usage.
Resource available here: https://supfam.mrc-lmb.cam.ac.uk/GenomePrep/datadir/the_list.tsv.gz
Thanks @afaulconbridge, this is a great idea. Since several read functions use the same parser, I wonder if the top level `Reader` class should implement the parser functions?
Hi @willgdjones, what are your thoughts on this?
Hey @willgdjones , I started looking at this as part of #44 and plan to push some parsing updates related to this soon. I was just thinking - perhaps for...