heatmaply icon indicating copy to clipboard operation
heatmaply copied to clipboard

More general correlations without assuming linearity

Open hdvinod opened this issue 8 years ago • 7 comments

There is a new R package called generalCorr with a simple function for example, gmcmtx0(mtcars)

produces 11 by 11 matrix of generalized correlation coefficients. Note that if r_(Xi | Xj) exceeds r_(Xj | Xi) then Xj is likely cause of Xi

It would be nice if one can view general correlation coefficients which are asymmetric and always larger than Pearson correlation coefficients.

hdvinod avatar May 31 '16 23:05 hdvinod

So there is no problem doing it if you are willing to not use the dendrograms:

install.packages("generalCorr")
install.packages("RColorBrewer")

# get the correlation:
library("generalCorr")
x <- gmcmtx0(mtcars)
# prepare some nice colors:
BrBG <- colorRampPalette(RColorBrewer::brewer.pal(11, "BrBG"))
heatmaply(x, Rowv=FALSE, Colv= FALSE,
    colors = BrBG , limits = c(-1,1)) %>%   layout(margin = list(l = 40, b = 40))

image

However, if you want to also have the dendrograms, the problem is that it may not be possible to have the same ordering in the two groups since the values of the matrix are not symmetrical (so the dendrograms are different, and their different topologies may not allow the two to have the same order as we would like).

What do you think?

talgalili avatar Jun 01 '16 09:06 talgalili

Can we give priority to causal side represented by numbers above the diagonal?

Sent from my iPhone

On Jun 1, 2016, at 5:23 AM, Tal Galili [email protected] wrote:

So there is no problem doing it if you are willing to not use the dendrograms:

install.packages("generalCorr") install.packages("RColorBrewer")

get the correlation:

library("generalCorr") x <- gmcmtx0(mtcars)

prepare some nice colors:

BrBG <- colorRampPalette(RColorBrewer::brewer.pal(11, "BrBG")) heatmaply(x, Rowv=FALSE, Colv= FALSE, colors = BrBG , limits = c(-1,1)) %>% layout(margin = list(l = 40, b = 40))

However, if you want to also have the dendrograms, the problem is that it may not be possible to have the same ordering in the two groups since the values of the matrix are not symmetrical (so the dendrograms are different, and their different topologies may not allow the two to have the same order as we would like).

What do you think?

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or mute the thread.

hdvinod avatar Jun 01 '16 15:06 hdvinod

Dear Tal

I wanted to try and focus on above diagonal correlation coefficients since they represent cause variable in a binary setting

I could not reproduce your plot on my computer Windows PC I must have done something wrong.with colors

heatmaply(x, Rowv=FALSE, Colv= FALSE,

  • colors = BrBG , limits = c(-1,1)) %>%   layout(margin = list(l = 40,
    
    b = 40)) Error in as.character(col) : cannot coerce type 'closure' to vector of type 'character'

It does work after removing the colors=BrBG,

How can one control the ordering of variables in the plots? I would like to focus on difference in absolute values: | r_ij | - | r_ji | If this is positive then j-th column is the cause.

please help thanks

On Wed, Jun 1, 2016 at 5:23 AM, Tal Galili [email protected] wrote:

So there is no problem doing it if you are willing to not use the dendrograms:

install.packages("generalCorr") install.packages("RColorBrewer")

get the correlation:

library("generalCorr")x <- gmcmtx0(mtcars)# prepare some nice colors:BrBG <- colorRampPalette(RColorBrewer::brewer.pal(11, "BrBG")) heatmaply(x, Rowv=FALSE, Colv= FALSE, colors = BrBG , limits = c(-1,1)) %>% layout(margin = list(l = 40, b = 40))

[image: image] https://urldefense.proofpoint.com/v2/url?u=https-3A__cloud.githubusercontent.com_assets_976006_15704458_7d353920-2D27f3-2D11e6-2D91be-2D9592e32cc5a1.png&d=CwMCaQ&c=aqMfXOEvEJQh2iQMCb7Wy8l0sPnURkcqADc2guUW8IM&r=jOon43tKLVRpvfeQu95XS9U8pSo3ZLUjmqbU_jNBdQE&m=YTSonS-8xEvhQWZl5imn_QGrJvnpeCeb00LJaWsiiVc&s=2r4nMmEv8JLp-SFDj7CXqPk7BRAy7-xRXAcGFULmA8g&e=

However, if you want to also have the dendrograms, the problem is that it may not be possible to have the same ordering in the two groups since the values of the matrix are not symmetrical (so the dendrograms are different, and their different topologies may not allow the two to have the same order as we would like).

What do you think?

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_talgalili_heatmaply_issues_13-23issuecomment-2D222939350&d=CwMCaQ&c=aqMfXOEvEJQh2iQMCb7Wy8l0sPnURkcqADc2guUW8IM&r=jOon43tKLVRpvfeQu95XS9U8pSo3ZLUjmqbU_jNBdQE&m=YTSonS-8xEvhQWZl5imn_QGrJvnpeCeb00LJaWsiiVc&s=DbdxnRPM3qA1UIXWQ0h2i8sMHpKE7XCz84TN4yoqffg&e=, or mute the thread https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_notifications_unsubscribe_ABw1eVj-5FZjPHGiBFkAYtoQUtQb1LGQeHks5qHU-2DdgaJpZM4IrDD0&d=CwMCaQ&c=aqMfXOEvEJQh2iQMCb7Wy8l0sPnURkcqADc2guUW8IM&r=jOon43tKLVRpvfeQu95XS9U8pSo3ZLUjmqbU_jNBdQE&m=YTSonS-8xEvhQWZl5imn_QGrJvnpeCeb00LJaWsiiVc&s=WnzycAVQL39ZaQBM-3LRLQJF3TLT1LJbHthb01Vy6tw&e= .

Hrishikesh (Rick) D. Vinod Professor of Economics, Fordham University E-Mail: [email protected] Tel 718-817-4065, Secretary 718-817-4048, Fax 718-817-3518 Web page: http://www.fordham.edu/economics/vinod ResearchGate says my papers have been cited 2162 times in various research publications.

hdvinod avatar Jun 01 '16 17:06 hdvinod

@hdvinod you can control ordering with Rowv and Colv (using vectors of integers, ie indexes). If you can provide some example code of how to use these correlations I'd be happy to consider incorporating in the package, otherwise I'd consider closing this.

alanocallaghan avatar Aug 19 '19 12:08 alanocallaghan

As you can see from the following example generalized correlations are very easy to compute and report with one-line coding. Yes the ordering of row vectors is simple with the ordering of the data input matrix.

see mtcars example in the attached file.

On Mon, Aug 19, 2019 at 8:11 AM Alanocallaghan [email protected] wrote:

@hdvinod https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_hdvinod&d=DwMCaQ&c=aqMfXOEvEJQh2iQMCb7Wy8l0sPnURkcqADc2guUW8IM&r=jOon43tKLVRpvfeQu95XS9U8pSo3ZLUjmqbU_jNBdQE&m=Mf_RyChjJdQp08PAXe4wYLNL7ZUAovZzeVxZDweqDVs&s=6cHeKvDKeZaBx9UK8rkj_72tbOn6w5vX66lddZPNCm0&e= you can control ordering with Rowv and Colv (using vectors of integers, ie indexes). If you can provide some example code of how to use these correlations I'd be happy to consider incorporating in the package, otherwise I'd consider closing this.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_talgalili_heatmaply_issues_13-3Femail-5Fsource-3Dnotifications-26email-5Ftoken-3DAAODK6LQJLZK3GAG523NVUTQFKEV5A5CNFSM4CFMGD2KYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD4SV4NY-23issuecomment-2D522542647&d=DwMCaQ&c=aqMfXOEvEJQh2iQMCb7Wy8l0sPnURkcqADc2guUW8IM&r=jOon43tKLVRpvfeQu95XS9U8pSo3ZLUjmqbU_jNBdQE&m=Mf_RyChjJdQp08PAXe4wYLNL7ZUAovZzeVxZDweqDVs&s=Mn_4llkP51ebtXfjkNDK-cjcr3N92Re4r3XbUZW34Yw&e=, or mute the thread https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_notifications_unsubscribe-2Dauth_AAODK6KIY4XCTE4XYMUGMNDQFKEV5ANCNFSM4CFMGD2A&d=DwMCaQ&c=aqMfXOEvEJQh2iQMCb7Wy8l0sPnURkcqADc2guUW8IM&r=jOon43tKLVRpvfeQu95XS9U8pSo3ZLUjmqbU_jNBdQE&m=Mf_RyChjJdQp08PAXe4wYLNL7ZUAovZzeVxZDweqDVs&s=gxH15K7tdlWCyXpIQD1jhb0FXGPFD889tuNZyfRQ3VE&e= .

--

Hrishikesh (Rick) D. Vinod Professor of Economics, Fordham University E-Mail: [email protected] Tel 718-817-4065, Secretary 718-817-4048, Fax 718-817-3518 Web page: http://www.fordham.edu/economics/vinod ResearchGate says my papers have been cited 2809 times in various research publications.

install.packages("generalCorr") library(generalCorr) options(np.messages=FALSE) mymtcars=mtcars[,1:5] rstar=gmcmtx0(mymtcars) rstar neword=c(1,4,2,3,5) rstar2=gmcmtx0(mymtcars[,neword]) rstar2

OUTPUT

rstar mpg cyl disp hp drat mpg 1.0000000 -0.8557900 -0.9508994 -0.9379374 0.6845546 cyl -0.9433125 1.0000000 0.9759183 0.9583212 -0.7512495 disp -0.8941676 0.9151419 1.0000000 0.9306311 -0.7697372 hp -0.8530474 0.8446589 0.8170031 1.0000000 -0.5542799 drat 0.6878267 -0.7015970 -0.9458881 -0.7434288 1.0000000 neword=c(1,4,2,3,5) rstar2=gmcmtx0(mymtcars[,neword]) rstar2 mpg hp cyl disp drat mpg 1.0000000 -0.9379374 -0.8557900 -0.9508994 0.6845546 hp -0.8530474 1.0000000 0.8446589 0.8170031 -0.5542799 cyl -0.9433125 0.9583212 1.0000000 0.9759183 -0.7512495 disp -0.8941676 0.9306311 0.9151419 1.0000000 -0.7697372 drat 0.6878267 -0.7434288 -0.7015970 -0.9458881 1.0000000

hdvinod avatar Aug 19 '19 14:08 hdvinod

Thanks, but what order criteria would you apply with this? I'm slightly unclear what you mean about |rji| - |rij|.

If I'm understanding correctly, this can only be used for ordering the rows/columns (via Rowv or Colv), and not for computing dendrograms (due to asymmetry).

alanocallaghan avatar Aug 20 '19 09:08 alanocallaghan

Dear Talgalili/Heatmaply: You are correct to say that asymmetry will limit application to dendograms. Yes |rji| - |rij| may not be useful here. It is one of three useful indicators of whether Xi causes Xj or vice versa A summary determination of the causal direction in generalCorr package is done by the command causeSummary(mtx) it pairs the first column of matrix mtx with all other columns and a decision rule reports which is likely to be the cause. One of these days we can talk about causality at length.

In our context of dendograms, We want to get away from the linearity assumption of correlation coefficients which can underestimate the dependence. Example: x=1:20; y=sin(x) simple correlation(x,y) is near zero even though x and y are perfectly dependent. gmcmtx0(cbind(x,y)) will have a better estimate of dependence!

If D=distance and C=correlation dendograms use D=1-C high positive correlation will have D=0 high negative correlation will have D=2 Same can be achieved by using gmcmtx0 function

Let sgn denote the sign of Pearson correlation coeff between Xi and Xj Now define C* or revised correlation as C*= sgn* max(|rij|,|rji|) we want to keep the sign of rij Now D*=1-C* is my proposal for more meaningful dendograms

I am using notation D and C from http://www.nonlinear.com/support/progenesis/comet/faq/v2.0/dendrogram.aspx

The vertical axis is labelled distance and refers to a distance measure between compounds or compound clusters.

The height of the node can be thought of as the distance value between the right and left sub-branch clusters.

The distance measure between two clusters is calculated as follows:

Please let me know if I can be of further assistance. We need a good example out there so folks can start using new dendograms. I hope this answers your e-mail

Best regards and congrats and cudos for your leadership in starting R-bloggers.

On Tue, Aug 20, 2019 at 5:38 AM Alanocallaghan [email protected] wrote:

Thanks, but what order criteria would you apply with this? I'm slightly unclear what you mean about |rji| - |rij|.

If I'm understanding correctly, this can only be used for ordering the rows/columns (via Rowv or Colv), and not for computing dendrograms (due to asymmetry).

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_talgalili_heatmaply_issues_13-3Femail-5Fsource-3Dnotifications-26email-5Ftoken-3DAAODK6PA246L6T4CI26QNVLQFO3QPA5CNFSM4CFMGD2KYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD4VWE2Y-23issuecomment-2D522936939&d=DwMCaQ&c=aqMfXOEvEJQh2iQMCb7Wy8l0sPnURkcqADc2guUW8IM&r=jOon43tKLVRpvfeQu95XS9U8pSo3ZLUjmqbU_jNBdQE&m=5YMR4Tbkz0s3OzJQFUeB7_Gkh1XEwXghf6I6taa5xS4&s=uqF3lPqTDqgtDTTF9dULBvgPOL1JQVlgBkegNgDrPSk&e=, or mute the thread https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_notifications_unsubscribe-2Dauth_AAODK6OKREUHFKYA5N7IY2DQFO3QPANCNFSM4CFMGD2A&d=DwMCaQ&c=aqMfXOEvEJQh2iQMCb7Wy8l0sPnURkcqADc2guUW8IM&r=jOon43tKLVRpvfeQu95XS9U8pSo3ZLUjmqbU_jNBdQE&m=5YMR4Tbkz0s3OzJQFUeB7_Gkh1XEwXghf6I6taa5xS4&s=mKios3V3sPA0mMSznfJ6AulYnP_NcVwXHi01YF5-yQQ&e= .

--

Hrishikesh (Rick) D. Vinod Professor of Economics, Fordham University E-Mail: [email protected] Tel 718-817-4065, Secretary 718-817-4048, Fax 718-817-3518 Web page: http://www.fordham.edu/economics/vinod ResearchGate says my papers have been cited 2809 times in various research publications.

hdvinod avatar Aug 20 '19 16:08 hdvinod