SNPRelate icon indicating copy to clipboard operation
SNPRelate copied to clipboard

Population Assignments for PCA plots

Open carlahurt opened this issue 4 years ago • 1 comments

Hello, I am working on a PCA analysis of some populations for a conservation genetics project on a crayfish species. My DAPC analysis did not show significant structure between sites, so I thought is would use a PCA approach as I understand this tries to look at individual differences (not group differences). I am able to use the SNPrelate tutorial to a point, but my VCF file does not contain population assignment information. I am not able to see on the plots the population affiliation of the data points. I see that you are importing a population file but I was not able to see how this is formatted. I’m pasting a screenshot of my R-code. Can you tell me the format of the file you are using to add population information? Also, is it possible to label individuals in the plots? I can see that I have a couple of outlier individuals and I would like to look closer at the data to see if there is something fishy. snprelate popns

carlahurt avatar Feb 05 '21 20:02 carlahurt

pop_code is just a vector of characters. Your question is more related to R programming itself, rather than SNPRelate. You can import pop_code from a text file: e.g., pop_code <- readLines("your_file"), each line for an individual.

YRI
YRI
CEU
...

And finally merge it with sample ID and eigenvectors:

  sample.id pop         EV1         EV2
1   NA19152 YRI -0.08237338 -0.01091830
2   NA19139 YRI -0.08299277 -0.01035197
3   NA18912 YRI -0.08160415 -0.01412062
4   NA19160 YRI -0.08695621 -0.01391751

zhengxwen avatar Feb 07 '21 00:02 zhengxwen