How to screen for regions under selection based on XPCLR values

Open lvqiang0120 opened this issue 1 year ago • 1 comments

Hi, I recently used the Python version of XPCLR to calculate XPCLR scores between two populations, and I have some questions about the results. First, In the result file with 13 columns: id, chrom, start, stop, pos_start, pos_stop, modelL, nullL, sel_coef, nSNPs, nSNPs_avail, xpclr, and xpclr_norm, is the 12th column (xpclr) the final XPCLR score?

Second, I've noticed in some articles that authors calculated XPCLR scores within non-overlapping 10-kb sliding windows across the genome. And They then merged adjacent windows , but I'm not quite sure why they merged adjacent windows and how to do it. For example, in the Methods section of Reference 1: "Mean XP-CLR likelihood scores were calculated within nonoverlapping 10-kb sliding windows. Adjacent windows with an average score in the top 20% of the genome-wide average were merged and were further combined if two windows were separated by only one window with a lower score. The maximum window-wise XP-CLR scores were assigned to the merged region as the region-wise score and those with region-wise scores in the top 10% were considered as candidate selective regions."

Can you give me some suggestions? Why the authors merged adjacent windows and how to do it.

Reference 1: Whole-genome resequencing of 445 Lactuca accessions reveals the domestication history of cultivated lettuce; https://doi.org/10.1038/s41588-021-00831-0

May 22 '24 14:05 lvqiang0120

I think they mean the windows were merged for reporting, ie. candidate regions for selection, rather than for analysis.

ie if two adjacent windows have high XPCLR values, they are combined into one for reporting.

May 22 '24 20:05 hardingnj