CMplot Circular Manhattan with only a subset of variants

Circular Manhattan with only a subset of variants

Open swvanderlaan opened this issue 4 years ago • 8 comments

I have a subset of variants I want to plot. Only about 3,000 on 1 chromosome. And I have 7 traits. So I would like to make a CMplot, of course that works nicely.

CMplot(dt.temp, plot.type = "c",
         bin.size = binsize,
         r = 0.25,
         outward = TRUE,
         cir.chr = TRUE,
         cir.chr.h = 1.25,
         cir.legend = TRUE,
         cir.legend.cex = 0.5,
         box = TRUE,
         cex.lab = 0.5,
         multracks = TRUE,
         main = var.temp,
         height = 20, width = 20, 
         file.output = TRUE,
         verbose = TRUE)

But it looks like the graph below. Circular-Manhattan XXL_VLDL_P_Z XXL_VLDL_L_Z XXL_VLDL_PL_Z XXL_VLDL_C_Z XXL_VLDL_CE_Z XXL_VLDL_FC_Z XXL_VLDL_TG_Z

How can I change this? Obviously I only want to plot the actual range of this particular dataset.

May 08 '20 15:05 swvanderlaan

I dove into the R code, but it's hard to find what I am looking for. I tested this also with a random range within the dummy-dataset provided with the function: it doesn't work either. So apparently the range is hard-coded? Would be great if you could point to a quick-and-dirty-solution.

May 08 '20 20:05 swvanderlaan

Thanks for your feedback. The result looks quite strange, which I haven't seen before, what I guess for reason of this situation may be the big number of NAs at the column of your given data or the incorrect data format, could you please check it?

May 10 '20 08:05 YinLiLin

Ooh good point. So here are the data. And here is a summary of the p-values for each trait using r summary().

    Min.  1st Qu.   Median     Mean  3rd Qu.     Max. 
0.002882 0.361270 0.662353 0.597973 0.853741 0.999606 
    Min.  1st Qu.   Median     Mean  3rd Qu.     Max. 
0.002667 0.365137 0.663248 0.601604 0.866840 0.999706 
    Min.  1st Qu.   Median     Mean  3rd Qu.     Max.     NA's 
0.002899 0.360100 0.657978 0.596601 0.845382 0.998819        2 
    Min.  1st Qu.   Median     Mean  3rd Qu.     Max. 
0.001806 0.391159 0.706977 0.624822 0.891210 0.999378 
     Min.   1st Qu.    Median      Mean   3rd Qu.      Max.      NA's 
0.0009095 0.4077800 0.6566510 0.6120924 0.8568850 0.9995300         1 
    Min.  1st Qu.   Median     Mean  3rd Qu.     Max. 
0.002787 0.365494 0.655594 0.598149 0.857637 0.999537 
    Min.  1st Qu.   Median     Mean  3rd Qu.     Max. 
0.002963 0.355814 0.655353 0.596316 0.856539 0.998847

There are some NA's. Potentially these cause the issue? dt.temp.txt

May 11 '20 14:05 swvanderlaan

Got a little further with this. It's not NAs or duplicate rows, rather there is a limit to how many variants CMplot expect there are to plot. Meaning there is a hard (?) limit on the beginning and end of the circle, xlim.

If I do this:

> data("pig60K")
>   summary(pig60K$Position)
     Min.   1st Qu.    Median      Mean   3rd Qu.      Max. 
        0  19294178  43635982  57061842  78129251 295554054 
>   dim(pig60K)
[1] 44580     6
>   CMplot(pig60K[pig60K$Chromosome==7,], plot.type = "c", outward = TRUE, memo = "original")
 Circular_Manhattan Plotting trait1_original.
 Circular_Manhattan Plotting trait2_original.
 Circular_Manhattan Plotting trait3_original.
 Plots are stored in: /Users/swvanderlaan/PLINK/analyses/ucorbio/UCORBIO_7Q22 
>   pig60K.temp <- pig60K[pig60K$Chromosome==7 & (pig60K$Position > 1929500 & pig60K$Position < 2929500),]
>   summary(pig60K.temp$Position)
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
1964356 2244271 2473484 2455343 2685658 2894421 
>   dim(pig60K.temp)
[1] 32  6
>   CMplot(pig60K.temp, plot.type = "c", outward = TRUE, memo = "test1")
 Circular_Manhattan Plotting trait1_test1.
 Circular_Manhattan Plotting trait2_test1.
 Circular_Manhattan Plotting trait3_test1.
 Plots are stored in: /Users/swvanderlaan/PLINK/analyses/ucorbio/UCORBIO_7Q22 
>   pig60K.temp <- pig60K[pig60K$Chromosome==7 & (pig60K$Position > 1929500 & pig60K$Position < 2283393),]
>   summary(pig60K.temp$Position)
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
1964356 2051549 2102745 2112845 2188069 2249485 
>   dim(pig60K.temp)
[1] 9 6
>   CMplot(pig60K.temp, plot.type = "c", outward = TRUE, memo = "test2")
 Circular_Manhattan Plotting trait1_test2.
 Circular_Manhattan Plotting trait2_test2.
 Circular_Manhattan Plotting trait3_test2.

I get three figures that are quite distinct.

The original data for chromosome 7. Circular-Manhattan trait1_original trait2_original trait3_original

The first test with chromosome 7: pig60K[pig60K$Chromosome==7 & (pig60K$Position > 1929500 & pig60K$Position < 2929500),] Circular-Manhattan trait1_test1 trait2_test1 trait3_test1

The second test with chromosome 7: pig60K[pig60K$Chromosome==7 & (pig60K$Position > 1929500 & pig60K$Position < 2283393),] Circular-Manhattan trait1_test2 trait2_test2 trait3_test2

Where in the code can I determine the beginning and end of the circle and base it on the input starting and ending position for a given region?

May 11 '20 21:05 swvanderlaan

Great thanks for your patience and carefulness of pointing out this issues, very helpful. From your test, my first instinct is that CMplot always starts the plot at 0 for each chromosome, rather than the minimum value, which maybe inappropriate for some cases, especially for a certain region of chromosome. I will tweak the script to address this issue at the next days, and I will feed back here if finished. Thanks again. Cheers!

May 13 '20 01:05 YinLiLin

Thanks. No problem - am happy to help.

Did you make any progress by any chance?

Jun 09 '20 19:06 swvanderlaan

Sorry for the late update. I have fixed the bug you proposed here, the latest version is available on GitHub, try to use it by source, and I will submit it to CRAN in the coming days, thanks again for the feedbacks.

Jun 26 '20 07:06 YinLiLin

Ah nice. I will check it out and will circle back!

Jun 30 '20 22:06 swvanderlaan

CMplot CMplot copied to clipboard

Circular Manhattan with only a subset of variants

CMplot
CMplot copied to clipboard