Andrew Riha
Andrew Riha
Thanks LaKisha. The issue with `snps` / `lineage` not being able to parse your converted file is because it's trying to apply the AncestryDNA parser based on the comments, and...
Closing since there are no updates required for this issue.
Sorry, I closed the issue too early. Upon further investigation, `snps` should be updated to handle the H3Africa format since the generic parser is not invoked (an rsid is not...
Hi @lakishadavid , please try to create a new virtual environment and install `lineage` again - I've updated it to support the latest version of `snps`. FYI, here are some...
Resource available here: https://supfam.mrc-lmb.cam.ac.uk/GenomePrep/datadir/badalleles.tsv.gz
That's a great idea and will really help reduce memory usage for those columns. And compared to `object`, it looks like `StringDtype` for the `rsid` column will also use less...
Note that in a quick test with one of the example files, `s._snps.index = s._snps.index.astype(pd.StringDtype())` reduces memory usage by ~2.5 times (very desirable). However, just using `.loc` with an rsid...
Upon further investigation, it looks like `object` and `pd.StringDtype()` use the same amount of memory, and resetting the index `dtype` as above actually just freed the memory used by a...
Hi @willgdjones , @afaulconbridge , please check out these changes and let me know what you think. Thanks!
Central to resolving SNP issues (e.g., assigning SNPs on chrom 0 (#13), populating missing RSIDs (#19), and assigning PAR SNPs) is having a resource that maintains a list of RSIDs,...