CMplot
CMplot copied to clipboard
Circular Manhattan with only a subset of variants
I have a subset of variants I want to plot. Only about 3,000 on 1 chromosome. And I have 7 traits. So I would like to make a CMplot
, of course that works nicely.
CMplot(dt.temp, plot.type = "c",
bin.size = binsize,
r = 0.25,
outward = TRUE,
cir.chr = TRUE,
cir.chr.h = 1.25,
cir.legend = TRUE,
cir.legend.cex = 0.5,
box = TRUE,
cex.lab = 0.5,
multracks = TRUE,
main = var.temp,
height = 20, width = 20,
file.output = TRUE,
verbose = TRUE)
But it looks like the graph below.
How can I change this? Obviously I only want to plot the actual range of this particular dataset.
I dove into the R code, but it's hard to find what I am looking for. I tested this also with a random range within the dummy-dataset provided with the function: it doesn't work either. So apparently the range is hard-coded? Would be great if you could point to a quick-and-dirty-solution.
Thanks for your feedback.
The result looks quite strange, which I haven't seen before, what I guess for reason of this situation may be the big number of NAs
at the column of your given data or the incorrect data format, could you please check it?
Ooh good point. So here are the data. And here is a summary of the p-values for each trait using r
summary()
.
Min. 1st Qu. Median Mean 3rd Qu. Max.
0.002882 0.361270 0.662353 0.597973 0.853741 0.999606
Min. 1st Qu. Median Mean 3rd Qu. Max.
0.002667 0.365137 0.663248 0.601604 0.866840 0.999706
Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
0.002899 0.360100 0.657978 0.596601 0.845382 0.998819 2
Min. 1st Qu. Median Mean 3rd Qu. Max.
0.001806 0.391159 0.706977 0.624822 0.891210 0.999378
Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
0.0009095 0.4077800 0.6566510 0.6120924 0.8568850 0.9995300 1
Min. 1st Qu. Median Mean 3rd Qu. Max.
0.002787 0.365494 0.655594 0.598149 0.857637 0.999537
Min. 1st Qu. Median Mean 3rd Qu. Max.
0.002963 0.355814 0.655353 0.596316 0.856539 0.998847
There are some NA's. Potentially these cause the issue? dt.temp.txt
Got a little further with this. It's not NAs or duplicate rows, rather there is a limit to how many variants CMplot
expect there are to plot. Meaning there is a hard (?) limit on the beginning and end of the circle, xlim
.
If I do this:
> data("pig60K")
> summary(pig60K$Position)
Min. 1st Qu. Median Mean 3rd Qu. Max.
0 19294178 43635982 57061842 78129251 295554054
> dim(pig60K)
[1] 44580 6
> CMplot(pig60K[pig60K$Chromosome==7,], plot.type = "c", outward = TRUE, memo = "original")
Circular_Manhattan Plotting trait1_original.
Circular_Manhattan Plotting trait2_original.
Circular_Manhattan Plotting trait3_original.
Plots are stored in: /Users/swvanderlaan/PLINK/analyses/ucorbio/UCORBIO_7Q22
> pig60K.temp <- pig60K[pig60K$Chromosome==7 & (pig60K$Position > 1929500 & pig60K$Position < 2929500),]
> summary(pig60K.temp$Position)
Min. 1st Qu. Median Mean 3rd Qu. Max.
1964356 2244271 2473484 2455343 2685658 2894421
> dim(pig60K.temp)
[1] 32 6
> CMplot(pig60K.temp, plot.type = "c", outward = TRUE, memo = "test1")
Circular_Manhattan Plotting trait1_test1.
Circular_Manhattan Plotting trait2_test1.
Circular_Manhattan Plotting trait3_test1.
Plots are stored in: /Users/swvanderlaan/PLINK/analyses/ucorbio/UCORBIO_7Q22
> pig60K.temp <- pig60K[pig60K$Chromosome==7 & (pig60K$Position > 1929500 & pig60K$Position < 2283393),]
> summary(pig60K.temp$Position)
Min. 1st Qu. Median Mean 3rd Qu. Max.
1964356 2051549 2102745 2112845 2188069 2249485
> dim(pig60K.temp)
[1] 9 6
> CMplot(pig60K.temp, plot.type = "c", outward = TRUE, memo = "test2")
Circular_Manhattan Plotting trait1_test2.
Circular_Manhattan Plotting trait2_test2.
Circular_Manhattan Plotting trait3_test2.
I get three figures that are quite distinct.
The original data for chromosome 7.
The first test with chromosome 7: pig60K[pig60K$Chromosome==7 & (pig60K$Position > 1929500 & pig60K$Position < 2929500),]
The second test with chromosome 7: pig60K[pig60K$Chromosome==7 & (pig60K$Position > 1929500 & pig60K$Position < 2283393),]
Where in the code can I determine the beginning and end of the circle and base it on the input starting and ending position for a given region?
Great thanks for your patience and carefulness of pointing out this issues, very helpful. From your test, my first instinct is that CMplot
always starts the plot at 0 for each chromosome, rather than the minimum value, which maybe inappropriate for some cases, especially for a certain region of chromosome. I will tweak the script to address this issue at the next days, and I will feed back here if finished. Thanks again. Cheers!
Thanks. No problem - am happy to help.
Did you make any progress by any chance?
Sorry for the late update. I have fixed the bug you proposed here, the latest version is available on GitHub, try to use it by source
, and I will submit it to CRAN in the coming days, thanks again for the feedbacks.
Ah nice. I will check it out and will circle back!