LDhat icon indicating copy to clipboard operation
LDhat copied to clipboard

rhomap results with extremely high 4Ner/kb

Open masen1991 opened this issue 2 years ago • 9 comments

Thank you for your software.I'm trying to use software to calculate the hotpots of recombination in the MHC region. THE SNP data of chr6 MHC region of YRI population in KGP were used. After PHASED, I used rhomap to directly calculate the data of 5MB region.

The running parameters are as follows:

rhomap -lk new_lk.txt -burn 100000 -its 1100000 -samp 100 -seq ldhat.sites -loc ldhat.locs

The likelihood lookup files used n=192, theta=0.001 per site file. After the calculation,i want to get the hotpots region in MHC region,I have the following questions。 1.can i use summary.txt to draw the results or do i need to use stat on the rates.txt to get res.txt and then to draw a plot? 2.Do you have some recommended parameters with rhomap?Because in LDhat.2.2 manual there is only recommended parameters in interval. 3.And the weirdest thing is, my results are extremely high with 4Ner/kb。In some region it can reach 867 (4Ner/kb).And i try rhomap on annother population data i even see 2000 (4Ner/kb).Is the numerical value of this result accurate?And I used whole 5mb region of mhc snps to calculate the recombination rate ,do i need to split to small region ?

masen1991 avatar May 25 '22 02:05 masen1991

Some thoughts below; although I will caution that it has been many years since I have thought about these issues.

1.can i use summary.txt to draw the results or do i need to use stat on the rates.txt to get res.txt and then to draw a plot?

For rhomap, you should be able to use the summary.txt file.

2.Do you have some recommended parameters with rhomap?Because in LDhat.2.2 manual there is only recommended parameters in interval.

From memory, the default parameters are reasonably for human data.

3.And the weirdest thing is, my results are extremely high with 4Ner/kb。In some region it can reach 867 (4Ner/kb).And i try rhomap on annother population data i even see 2000 (4Ner/kb).Is the numerical value of this result accurate?And I used whole 5mb region of mhc snps to calculate the recombination rate ,do i need to split to small region ?

In some respects, this isn't particularly strange. For regions with high recombination rates, the point estimate of the recombination rate is very difficult to estimate. One way to think about this is as the recombination rate in a hotspot gets higher, then SNPs on either side of the region become increasingly uncorrelated, and therefore don't contain much information about the recombination rate in the hotspot.

If you are concerned about the rate estimates from rhomap, I think it is sensible to continue to use interval. In general, interval is a more robust algorithm for rate estimation (although it doesn't allow inference of hotspot locations as easily).

— Reply to this email directly, view it on GitHub https://github.com/auton1/LDhat/issues/21, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABEVMYZ32CNNW62YM4ZZGGDVLWDSDANCNFSM5W3P3ZOQ . You are receiving this because you are subscribed to this thread.Message ID: @.***>

-- Adam Auton

auton1 avatar May 26 '22 16:05 auton1

Thank you very much for your reply, which is of great help to me

masen1991 avatar May 27 '22 02:05 masen1991

@auton1 I want to confirm the accuracy of the following results. i plot YRI(with 96 samples as n=192 ) interval and rhomap as follows: Screen Shot 2022-06-05 at 3 43 13 PM and rhomap : Screen Shot 2022-06-05 at 3 43 53 PM It all rely on MHC region. 1.Can both plot tell the recombination hot spots region? 2.with out Ne and if i only want to find the region of recombination hot spots,which would u recommended to use?

masen1991 avatar Jun 05 '22 07:06 masen1991

This looks somewhat more discordant than I would expect. What parameters are you using? I also note that the units on the y-axis appear different.

On Sun, 5 Jun 2022 at 00:49, masen0407 @.***> wrote:

@auton1 https://github.com/auton1 I want to confirm the accuracy of the following results. i plot YRI(with 96 samples as n=192 ) interval and rhomap as follows: [image: Screen Shot 2022-06-05 at 3 43 13 PM] https://user-images.githubusercontent.com/55567673/172040780-8ac42366-1dbd-4e55-ab5b-da8d9d0d9c6e.png and rhomap : [image: Screen Shot 2022-06-05 at 3 43 53 PM] https://user-images.githubusercontent.com/55567673/172040779-bb739cd4-73fb-4d71-904b-4f5233437f14.png It all rely on MHC region. 1.Can both plot tell the recombination hot spots region? 2.with out Ne and if i only want to find the region of recombination hot spots,which would u recommended to use?

— Reply to this email directly, view it on GitHub https://github.com/auton1/LDhat/issues/21#issuecomment-1146758727, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABEVMY76NXVEQGSVZKHABITVNRLXPANCNFSM5W3P3ZOQ . You are receiving this because you were mentioned.Message ID: @.***>

-- Adam Auton

auton1 avatar Jun 05 '22 17:06 auton1

@auton1 i use vcftools to get ldhat.sites and ldhat.locs lkgen -lk lk_n192_t0.001 -nseq 96 to get a new new_lk.txt with rhomap: rhomap -lk new_lk.txt -burn 100000 -its 1000000 -samp 100 -seq ldhat.sites -loc ldhat.locs with interval: interval -lk new_lk.txt -samp 2000 -its 1000000 -bpen 5 -seq ldhat.sites -loc ldhat.locs stat -input rates.txt And yes ,the y-axis appear different so i want to know why the rhomap results with extremely high 4Ner/kb.is it right ?

masen1991 avatar Jun 06 '22 00:06 masen1991

The recombination rate "hotspots" correspond to segmental duplication loci. These might be violating the model assumptions. On the other hand, it is highly possible based on what we're seeing in the HPRC that there is a lot of recombination-like activity in these loci.

This is the subgraph of the MHC in the HPRC pggb graph:

Screenshot from 2022-06-06 10-28-49

It approximately corresponds to your reference range.

From annotation using gfaestus, it's clear that the big bubble, which would correspond to the highest peak in your plot, corresponds to the MHC Class-II genes.

Screenshot from 2022-06-06 10-28-46

This is taken from three slides starting here https://docs.google.com/presentation/d/1qDSHpi1i2esmnIuiBA0g5EOGzvq8jUnkIw1yFFrp4hw/edit#slide=id.g12e89fff832_0_117

ekg avatar Jun 06 '22 08:06 ekg

@ekg @auton1 Thank you very much for your explanation. I know that MHC Class-II genes may be very complicated to count the recombination.What I want to do is make sure that the results make sense. As u can see, in the other region ,such as up from 30000kb and the region between 31-32 mb ,they may be MHC Class-I genes(HLA-A/C/B) in this region ,so do the peak also means that recombination hot spots in that areas?

Thers are many paper use LDhat to count the recombination.Such as https://www.nature.com/articles/ng1885 and https://doi.org/10.1016/j.jgg.2022.03.006. Screen Shot 2022-06-07 at 10 19 14 AM

They may end up very different from my rhomap plot.How to find a population-specific recombination hot spots region(may not need to be format as cM/Mb,but just to confirm,where have a higher recombination rate)?

masen1991 avatar Jun 07 '22 02:06 masen1991

@auton1 I have another question. To convert result of rho to centimorgan,as i see on https://github.com/auton1/LDhat/issues/18 if i use likelihood lookup file of n=192, theta=0.001 per site,and just let's assume the population size is 10000(Ne = 10000). To convert from 4 Ne r / kb to cM / Mb,So if a region have a recombination rate of x (in units of 4 Ne r / kb), this would be 2.5 x in units of cM /Mb . Is that correct? And Do you have a recommended method for calculating Ne (How can I quickly calculate the Ne required for the conversion)?

masen1991 avatar Jun 15 '22 02:06 masen1991

For starters, I suggest converting the interval output to rho / kb (i.e. scaling by physical distance, in kb, between SNPs), which may resolve some of the discrepancies.

I'd also suggest increasing the hotspot penalty in rhomap.

Is your lkgen command correct? You said you had 96 samples, which would imply 192 haplotypes? (Is your data phased?)

On Sun, 5 Jun 2022 at 17:47, masen0407 @.***> wrote:

i use vcftools to get ldhat.sites and ldhat.locs lkgen -lk lk_n192_t0.001 -nseq 96 to get a new new_lk.txt with rhomap: rhomap -lk new_lk.txt -burn 100000 -its 1000000 -samp 100 -seq ldhat.sites -loc ldhat.locs with interval: interval -lk new_lk.txt -samp 2000 -its 1000000 -bpen 5 -seq ldhat.sites -loc ldhat.locs stat -input rates.txt And yes ,the y-axis appear different so i want to know the rhomap results with extremely high 4Ner/kb

— Reply to this email directly, view it on GitHub https://github.com/auton1/LDhat/issues/21#issuecomment-1146923520, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABEVMY2DL5U6PNUPIEGVSU3VNVDAPANCNFSM5W3P3ZOQ . You are receiving this because you were mentioned.Message ID: @.***>

-- Adam Auton

auton1 avatar Oct 11 '22 07:10 auton1