SNPRelate icon indicating copy to clipboard operation
SNPRelate copied to clipboard

snpgdsSlidingWindow winstart= bug

Open marqueda opened this issue 5 years ago • 3 comments
trafficstars

Dear @zhengxwen,

I am running v1.20.1 of SNPRelate and tried to find out the start / end coordinates of the sliding windows I computed with snpgdsSlidingWindow(). I realized that when I specify winstart=1 or winstart=100,000 or do not specify winstart at all, the first window position in the output list (a$chrXXI.pos[1]) is always 12,125. Could there be a bug that prevents the winstart=1 option from working?

I am running the following command:

a=snpgdsSlidingWindow(gdsobj=genofile,unit="basepair",winsize=10000,shift=5000,winstart=100000,sample.id=samples,snp.id=snps,verbose=T,as.is="list",FUN=pcadist)

Then, I am not sure I understand how the windows are defined (maybe related to the winstart bug?). The first 25 SNP positions in genofile are 11888, 11922, 11923, 12005, 12050, 12052, 12063, 12064, 12070, 12091, 12104, 12136, 12139, 12149, 12155, 12188, 12295, 12320, 12324, 12347, 12351, 58740, 58745, 58751, 58753. So my expectation for winstart=1 would be that the first sliding window (positions 1-10,000) should be returned as empty value and the second and third sliding window (positions 5,001-15,000 and positions 10,000-20,000) should give the same value as the same SNPs between positions 11,888 and 12,351 are overlapping both windows. Instead, only the first window the function outputs contains a value. Could this be related to the problem that winstart=1 is ignored and then 11,888 is defined as the first window starting position?

Maybe as a enhancement request, it would be fantastic if snpgdsSlidingWindow could output not only the mean coordinate of positions, but also start and end coordinate of all sliding windows! That information seems very relevant for "basepair" windows, as it might be important to know whether a window spans 10,000-20,000 or the 11,888-21,888 interval - something not possible to discern right now.

Thank you for looking into this! Best regards, David

marqueda avatar May 22 '20 07:05 marqueda

I dug a bit deeper: it seems to correctly compute the number of windows - when I enter winstart=1, there are 1510 windows, but when I enter winstart =1,000,000, there are only 1,310 windows. However, instead of starting at the winstart position, the sliding windows still start at the first SNP position (11,888) and from there, the correct number of windows is computed leading to an early truncation of the sliding window data. It seems the computation starts at the wrong place, but then computes the right number of windows. I hope this helps to resolve the issue?

marqueda avatar May 22 '20 08:05 marqueda

snpgdsSlidingWindow() does not return the window if it is empty, so the first window starts from 11,888.

zhengxwen avatar May 24 '20 10:05 zhengxwen

Dear @zhengxwen,

Thank you for your reply. It surely makes sense that no empty window is returned. But is it possible to change the window starting position with the winstart= option? It seems not to change anything about the output, whether I place it before 11,888 or after (e.g. 12,000). That's why I suggested the option might not be working as intended, or am I maybe misunderstanding what the option winstart= is supposed to do?

Thank you and best wishes, David

marqueda avatar May 25 '20 11:05 marqueda