SNPRelate icon indicating copy to clipboard operation
SNPRelate copied to clipboard

Combining files

Open cap76 opened this issue 2 years ago • 1 comments

Thanks for the package. I'm running v.1.16.0 (the latest my current version of R will allow), and am having some issues loading in multiple files. I think I am simply doing something wrong or something that is no longer an issue in later versions. Basically I have M gds files files (e.g., Sample1.gds, Sample2.gds) that I generated using snpgdsVCF2GDS, which I would like to load in to do some PCA and other stuff on. If I do:

snpgdsCombineGeno(c("Sample1.gds","Sample2.gds"), "Merged.gds")

It merges the samples into 1 sample, so obviously when I come to doing snpgdsPCA I end up with a single data point rather than 2.

I have tried adding a new ids to the individual files e.g.: genofile = openfn.gds("Sample1.gds", readonly=FALSE) add.gdsn(genofile, "sample.annot", val="Sample1")

but this doesn't change things. Is there a way load in multiple files for use with snpgdsPCA

cap76 avatar Sep 11 '22 16:09 cap76

Okay, I have figured this out. If anyone else is having similar issues (e.g., with older versions) it's because each of the gds files I generated had the same (default) sample.id, so when doing snpgdsCombineGeno it obviously merged them together. If you open each gds file and reassign a unique id snpgdsCombineGeno will keep them separate e.g.

genofile <- openfn.gds("File1.gds",readonly=FALSE) add.gdsn(genofile, "sample.id", "NewID1", replace=TRUE) genofile <- openfn.gds("File2.gds",readonly=FALSE) add.gdsn(genofile, "sample.id", "NewID2", replace=TRUE)

Then you can rerun snpgdsCombineGeno and do downstream analyses.

cap76 avatar Sep 13 '22 11:09 cap76