SNPRelate
SNPRelate copied to clipboard
Combining files
Thanks for the package. I'm running v.1.16.0 (the latest my current version of R will allow), and am having some issues loading in multiple files. I think I am simply doing something wrong or something that is no longer an issue in later versions. Basically I have M gds files files (e.g., Sample1.gds, Sample2.gds) that I generated using snpgdsVCF2GDS, which I would like to load in to do some PCA and other stuff on. If I do:
snpgdsCombineGeno(c("Sample1.gds","Sample2.gds"), "Merged.gds")
It merges the samples into 1 sample, so obviously when I come to doing snpgdsPCA I end up with a single data point rather than 2.
I have tried adding a new ids to the individual files e.g.: genofile = openfn.gds("Sample1.gds", readonly=FALSE) add.gdsn(genofile, "sample.annot", val="Sample1")
but this doesn't change things. Is there a way load in multiple files for use with snpgdsPCA
Okay, I have figured this out. If anyone else is having similar issues (e.g., with older versions) it's because each of the gds files I generated had the same (default) sample.id, so when doing snpgdsCombineGeno it obviously merged them together. If you open each gds file and reassign a unique id snpgdsCombineGeno will keep them separate e.g.
genofile <- openfn.gds("File1.gds",readonly=FALSE) add.gdsn(genofile, "sample.id", "NewID1", replace=TRUE) genofile <- openfn.gds("File2.gds",readonly=FALSE) add.gdsn(genofile, "sample.id", "NewID2", replace=TRUE)
Then you can rerun snpgdsCombineGeno and do downstream analyses.